Search | arXiv e-print repository

Can Transformers In-Context Learn Behavior of a Linear Dynamical System?

Abstract: We investigate whether transformers can learn to track a random process when given observations of a related process and parameters of the dynamical system that relates them as context. More specifically, we consider a finite-dimensional state-space model described by the state transition matrix $F$, measurement matrices $h_1, \dots, h_N$, and the process and measurement noise covariance matrices… ▽ More We investigate whether transformers can learn to track a random process when given observations of a related process and parameters of the dynamical system that relates them as context. More specifically, we consider a finite-dimensional state-space model described by the state transition matrix $F$, measurement matrices $h_1, \dots, h_N$, and the process and measurement noise covariance matrices $Q$ and $R$, respectively; these parameters, randomly sampled, are provided to the transformer along with the observations $y_1,\dots,y_N$ generated by the corresponding linear dynamical system. We argue that in such settings transformers learn to approximate the celebrated Kalman filter, and empirically verify this both for the task of estimating hidden states $\hat{x}_{N|1,2,3,...,N}$ as well as for one-step prediction of the $(N+1)^{st}$ observation, $\hat{y}_{N+1|1,2,3,...,N}$. A further study of the transformer's robustness reveals that its performance is retained even if the model's parameters are partially withheld. In particular, we demonstrate that the transformer remains accurate at the considered task even in the absence of state transition and noise covariance matrices, effectively emulating operations of the Dual-Kalman filter. △ Less

Submitted 21 October, 2024; originally announced October 2024.

arXiv:2410.08508 [pdf, other]

Accelerated Distributed Stochastic Non-Convex Optimization over Time-Varying Directed Networks

Authors: Yiyue Chen, Abolfazl Hashemi, Haris Vikalo

Abstract: Distributed stochastic non-convex optimization problems have recently received attention due to the growing interest of signal processing, computer vision, and natural language processing communities in applications deployed over distributed learning systems (e.g., federated learning). We study the setting where the data is distributed across the nodes of a time-varying directed network, a topolog… ▽ More Distributed stochastic non-convex optimization problems have recently received attention due to the growing interest of signal processing, computer vision, and natural language processing communities in applications deployed over distributed learning systems (e.g., federated learning). We study the setting where the data is distributed across the nodes of a time-varying directed network, a topology suitable for modeling dynamic networks experiencing communication delays and straggler effects. The network nodes, which can access only their local objectives and query a stochastic first-order oracle to obtain gradient estimates, collaborate to minimize a global objective function by exchanging messages with their neighbors. We propose an algorithm, novel to this setting, that leverages stochastic gradient descent with momentum and gradient tracking to solve distributed non-convex optimization problems over time-varying networks. To analyze the algorithm, we tackle the challenges that arise when analyzing dynamic network systems which communicate gradient acceleration components. We prove that the algorithm's oracle complexity is $\mathcal{O}(1/ε^{1.5})$, and that under Polyak-$Ł$ojasiewicz condition the algorithm converges linearly to a steady error state. The proposed scheme is tested on several learning tasks: a non-convex logistic regression experiment on the MNIST dataset, an image classification task on the CIFAR-10 dataset, and an NLP classification test on the IMDB dataset. We further present numerical simulations with an objective that satisfies the PL condition. The results demonstrate superior performance of the proposed framework compared to the existing related methods. △ Less

Submitted 11 October, 2024; originally announced October 2024.

Comments: This work has been accepted at IEEE Transactions on Automatic Control

arXiv:2312.13380 [pdf, other]

Fed-QSSL: A Framework for Personalized Federated Learning under Bitwidth and Data Heterogeneity

Authors: Yiyue Chen, Haris Vikalo, Chianing Wang

Abstract: Motivated by high resource costs of centralized machine learning schemes as well as data privacy concerns, federated learning (FL) emerged as an efficient alternative that relies on aggregating locally trained models rather than collecting clients' potentially private data. In practice, available resources and data distributions vary from one client to another, creating an inherent system heteroge… ▽ More Motivated by high resource costs of centralized machine learning schemes as well as data privacy concerns, federated learning (FL) emerged as an efficient alternative that relies on aggregating locally trained models rather than collecting clients' potentially private data. In practice, available resources and data distributions vary from one client to another, creating an inherent system heterogeneity that leads to deterioration of the performance of conventional FL algorithms. In this work, we present a federated quantization-based self-supervised learning scheme (Fed-QSSL) designed to address heterogeneity in FL systems. At clients' side, to tackle data heterogeneity we leverage distributed self-supervised learning while utilizing low-bit quantization to satisfy constraints imposed by local infrastructure and limited communication resources. At server's side, Fed-QSSL deploys de-quantization, weighted aggregation and re-quantization, ultimately creating models personalized to both data distribution as well as specific infrastructure of each client's device. We validated the proposed algorithm on real world datasets, demonstrating its efficacy, and theoretically analyzed impact of low-bit training on the convergence and robustness of the learned models. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: This work has been accepted at the 38th AAAI Conference on Artificial Intelligence (AAAI-24)

arXiv:2302.10450 [pdf, other]

Automotive RADAR sub-sampling via object detection networks: Leveraging prior signal information

Authors: Madhumitha Sakthi, Ahmed Tewfik, Marius Arvinte, Haris Vikalo

Abstract: Automotive radar has increasingly attracted attention due to growing interest in autonomous driving technologies. Acquiring situational awareness using multimodal data collected at high sampling rates by various sensing devices including cameras, LiDAR, and radar requires considerable power, memory and compute resources which are often limited at an edge device. In this paper, we present a novel a… ▽ More Automotive radar has increasingly attracted attention due to growing interest in autonomous driving technologies. Acquiring situational awareness using multimodal data collected at high sampling rates by various sensing devices including cameras, LiDAR, and radar requires considerable power, memory and compute resources which are often limited at an edge device. In this paper, we present a novel adaptive radar sub-sampling algorithm designed to identify regions that require more detailed/accurate reconstruction based on prior environmental conditions' knowledge, enabling near-optimal performance at considerably lower effective sampling rates. Designed to robustly perform under variable weather conditions, the algorithm was shown on the Oxford raw radar and RADIATE dataset to achieve accurate reconstruction utilizing only 10% of the original samples in good weather and 20% in extreme (snow, fog) weather conditions. A further modification of the algorithm incorporates object motion to enable reliable identification of important regions. This includes monitoring possible future occlusions caused by objects detected in the present frame. Finally, we train a YOLO network on the RADIATE dataset to perform object detection directly on RADAR data and obtain a 6.6% AP50 improvement over the baseline Faster R-CNN network. △ Less

Submitted 21 February, 2023; originally announced February 2023.

arXiv:2101.09583 [pdf, other]

Communication-Efficient Variance-Reduced Decentralized Stochastic Optimization over Time-Varying Directed Graphs

Authors: Yiyue Chen, Abolfazl Hashemi, Haris Vikalo

Abstract: We consider the problem of decentralized optimization over time-varying directed networks. The network nodes can access only their local objectives, and aim to collaboratively minimize a global function by exchanging messages with their neighbors. Leveraging sparsification, gradient tracking and variance-reduction, we propose a novel communication-efficient decentralized optimization scheme that i… ▽ More We consider the problem of decentralized optimization over time-varying directed networks. The network nodes can access only their local objectives, and aim to collaboratively minimize a global function by exchanging messages with their neighbors. Leveraging sparsification, gradient tracking and variance-reduction, we propose a novel communication-efficient decentralized optimization scheme that is suitable for resource-constrained time-varying directed networks. We prove that in the case of smooth and strongly-convex objective functions, the proposed scheme achieves an accelerated linear convergence rate. To our knowledge, this is the first decentralized optimization framework for time-varying directed networks that achieves such a convergence rate and applies to settings requiring sparsified communication. Experimental results on both synthetic and real datasets verify the theoretical results and demonstrate efficacy of the proposed scheme. △ Less

Submitted 2 December, 2021; v1 submitted 23 January, 2021; originally announced January 2021.

arXiv:2011.08295 [pdf, other]

Real-Time Radio Technology and Modulation Classification via an LSTM Auto-Encoder

Authors: Ziqi Ke, Haris Vikalo

Abstract: Identification of the type of communication technology and/or modulation scheme based on detected radio signal are challenging problems encountered in a variety of applications including spectrum allocation and radio interference mitigation. They are rendered difficult due to a growing number of emitter types and varied effects of real-world channels upon the radio signal. Existing spectrum monito… ▽ More Identification of the type of communication technology and/or modulation scheme based on detected radio signal are challenging problems encountered in a variety of applications including spectrum allocation and radio interference mitigation. They are rendered difficult due to a growing number of emitter types and varied effects of real-world channels upon the radio signal. Existing spectrum monitoring techniques are capable of acquiring massive amounts of radio and real-time spectrum data using compact sensors deployed in a variety of settings. However, state-of-the-art methods that use such data to classify emitter types and detect communication schemes struggle to achieve required levels of accuracy at a computational efficiency that would allow their implementation on low-cost computational platforms. In this paper, we present a learning framework based on an LSTM denoising auto-encoder designed to automatically extract stable and robust features from noisy radio signals, and infer modulation or technology type using the learned features. The algorithm utilizes a compact neural network architecture readily implemented on a low-cost computational platform while exceeding state-of-the-art accuracy. Results on realistic synthetic as well as over-the-air radio data demonstrate that the proposed framework reliably and efficiently classifies received radio signals, often demonstrating superior performance compared to state-of-the-art methods. △ Less

Submitted 16 November, 2020; originally announced November 2020.

arXiv:2005.13189 [pdf, ps, other]

Decentralized Optimization On Time-Varying Directed Graphs Under Communication Constraints

Authors: Yiyue Chen, Abolfazl Hashemi, Haris Vikalo

Abstract: We consider the problem of decentralized optimization where a collection of agents, each having access to a local cost function, communicate over a time-varying directed network and aim to minimize the sum of those functions. In practice, the amount of information that can be exchanged between the agents is limited due to communication constraints. We propose a communication-efficient algorithm fo… ▽ More We consider the problem of decentralized optimization where a collection of agents, each having access to a local cost function, communicate over a time-varying directed network and aim to minimize the sum of those functions. In practice, the amount of information that can be exchanged between the agents is limited due to communication constraints. We propose a communication-efficient algorithm for decentralized convex optimization that rely on sparsification of local updates exchanged between neighboring agents in the network. In directed networks, message sparsification alters column-stochasticity -- a property that plays an important role in establishing convergence of decentralized learning tasks. We propose a decentralized optimization scheme that relies on local modification of mixing matrices, and show that it achieves $\mathcal{O}(\frac{\mathrm{ln}T}{\sqrt{T}})$ convergence rate in the considered settings. Experiments validate theoretical results and demonstrate efficacy of the proposed algorithm. △ Less

Submitted 30 August, 2021; v1 submitted 27 May, 2020; originally announced May 2020.

arXiv:1909.12898 [pdf, other]

Identifying Sparse Low-Dimensional Structures in Markov Chains: A Nonnegative Matrix Factorization Approach

Authors: Mahsa Ghasemi, Abolfazl Hashemi, Haris Vikalo, Ufuk Topcu

Abstract: We consider the problem of learning low-dimensional representations for large-scale Markov chains. We formulate the task of representation learning as that of mapping the state space of the model to a low-dimensional state space, called the kernel space. The kernel space contains a set of meta states which are desired to be representative of only a small subset of original states. To promote this… ▽ More We consider the problem of learning low-dimensional representations for large-scale Markov chains. We formulate the task of representation learning as that of mapping the state space of the model to a low-dimensional state space, called the kernel space. The kernel space contains a set of meta states which are desired to be representative of only a small subset of original states. To promote this structural property, we constrain the number of nonzero entries of the mappings between the state space and the kernel space. By imposing the desired characteristics of the representation, we cast the problem as a constrained nonnegative matrix factorization. To compute the solution, we propose an efficient block coordinate gradient descent and theoretically analyze its convergence properties. △ Less

Submitted 7 April, 2020; v1 submitted 27 September, 2019; originally announced September 2019.

Comments: Accepted for publication in American Control Conference (ACC) Proceedings, 2020

arXiv:1905.09919 [pdf, other]

Submodular Observation Selection and Information Gathering for Quadratic Models

Authors: Abolfazl Hashemi, Mahsa Ghasemi, Haris Vikalo, Ufuk Topcu

Abstract: We study the problem of selecting most informative subset of a large observation set to enable accurate estimation of unknown parameters. This problem arises in a variety of settings in machine learning and signal processing including feature selection, phase retrieval, and target localization. Since for quadratic measurement models the moment matrix of the optimal estimator is generally unknown,… ▽ More We study the problem of selecting most informative subset of a large observation set to enable accurate estimation of unknown parameters. This problem arises in a variety of settings in machine learning and signal processing including feature selection, phase retrieval, and target localization. Since for quadratic measurement models the moment matrix of the optimal estimator is generally unknown, majority of prior work resorts to approximation techniques such as linearization of the observation model to optimize the alphabetical optimality criteria of an approximate moment matrix. Conversely, by exploiting a connection to the classical Van Trees' inequality, we derive new alphabetical optimality criteria without distorting the relational structure of the observation model. We further show that under certain conditions on parameters of the problem these optimality criteria are monotone and (weak) submodular set functions. These results enable us to develop an efficient greedy observation selection algorithm uniquely tailored for quadratic models, and provide theoretical bounds on its achievable utility. △ Less

Submitted 23 May, 2019; originally announced May 2019.

Comments: To be published in proceedings of International Conference on Machine Learning (ICML) 2019

arXiv:1807.08627 [pdf, other]

Randomized Greedy Sensor Selection: Leveraging Weak Submodularity

Authors: Abolfazl Hashemi, Mahsa Ghasemi, Haris Vikalo, Ufuk Topcu

Abstract: We study the problem of estimating a random process from the observations collected by a network of sensors that operate under resource constraints. When the dynamics of the process and sensor observations are described by a state-space model and the resource are unlimited, the conventional Kalman filter provides the minimum mean-square error (MMSE) estimates. However, at any given time, restricti… ▽ More We study the problem of estimating a random process from the observations collected by a network of sensors that operate under resource constraints. When the dynamics of the process and sensor observations are described by a state-space model and the resource are unlimited, the conventional Kalman filter provides the minimum mean-square error (MMSE) estimates. However, at any given time, restrictions on the available communications bandwidth and computational capabilities and/or power impose a limitation on the number of network nodes whose observations can be used to compute the estimates. We formulate the problem of selecting the most informative subset of the sensors as a combinatorial problem of maximizing a monotone set function under a uniform matroid constraint. For the MMSE estimation criterion we show that the maximum element-wise curvature of the objective function satisfies a certain upper-bound constraint and is, therefore, weak submodular. We develop an efficient randomized greedy algorithm for sensor selection and establish guarantees on the estimator's performance in this setting. Extensive simulation results demonstrate the efficacy of the randomized greedy algorithm compared to state-of-the-art greedy and semidefinite programming relaxation methods. △ Less

Submitted 19 July, 2018; originally announced July 2018.

Comments: arXiv admin note: text overlap with arXiv:1709.08823

arXiv:1807.07650 [pdf, other]

Near-Optimal Distributed Estimation for a Network of Sensing Units Operating Under Communication Constraints

Authors: Abolfazl Hashemi, Osman Fatih Kilic, Haris Vikalo

Abstract: We study the problem of distributed state estimation in a network of sensing units that can exchange their measurements but the rate of communication between the units is constrained. The units collect noisy, possibly only partial observations of the unknown state; they are assisted by a relay center which can communicate at a higher rate and schedules the exchange of measurements between the unit… ▽ More We study the problem of distributed state estimation in a network of sensing units that can exchange their measurements but the rate of communication between the units is constrained. The units collect noisy, possibly only partial observations of the unknown state; they are assisted by a relay center which can communicate at a higher rate and schedules the exchange of measurements between the units. We consider the task of minimizing the total mean-square estimation error of the network while promoting balance between the individual units' performances. This problem is formulated as the maximization of a monotone objective function subject to a cardinality constraint. By leveraging the notion of weak submodularity, we develop an efficient greedy algorithm for the proposed formulation and show that the greedy algorithm achieves a constant factor approximation of the optimal objective. Our extensive simulation studies illustrate the efficacy of the proposed formulation and the greedy algorithm. △ Less

Submitted 19 July, 2018; originally announced July 2018.

arXiv:1807.07222 [pdf, other]

Towards Accelerated Greedy Sampling and Reconstruction of Bandlimited Graph Signals

Authors: Abolfazl Hashemi, Rasoul Shafipour, Haris Vikalo, Gonzalo Mateos

Abstract: We study the problem of sampling and reconstructing spectrally sparse graph signals where the objective is to select a subset of nodes of prespecified cardinality that ensures interpolation of the original signal with the lowest possible reconstruction error. This task is of critical importance in Graph signal processing (GSP) and while existing methods generally provide satisfactory performance,… ▽ More We study the problem of sampling and reconstructing spectrally sparse graph signals where the objective is to select a subset of nodes of prespecified cardinality that ensures interpolation of the original signal with the lowest possible reconstruction error. This task is of critical importance in Graph signal processing (GSP) and while existing methods generally provide satisfactory performance, they typically entail a prohibitive computational cost when it comes to the study of large-scale problems. Thus, there is a need for accelerated and efficient methods tailored for high-dimensional and large-scale sampling and reconstruction tasks. To this end, we first consider a non-Bayesian scenario and propose an efficient iterative node sampling procedure that in the noiseless case enables exact recovery of the original signal from the set of selected nodes. In the case of noisy measurements, a bound on the reconstruction error of the proposed algorithm is established. Then, we consider the Bayesian scenario where we formulate the sampling task as the problem of maximizing a monotone weak submodular function, and propose a randomized-greedy algorithm to find a sub-optimal subset of informative nodes. We derive worst-case performance guarantees on the mean-square error achieved by the randomized-greedy algorithm for general non-stationary graph signals. △ Less

Submitted 23 November, 2021; v1 submitted 18 July, 2018; originally announced July 2018.

arXiv:1807.07184 [pdf, other]

A Novel Scheme for Support Identification and Iterative Sampling of Bandlimited Graph Signals

Authors: Abolfazl Hashemi, Rasoul Shafipour, Haris Vikalo, Gonzalo Mateos

Abstract: We study the problem of sampling and reconstruction of bandlimited graph signals where the objective is to select a node subset of prescribed cardinality that ensures interpolation of the original signal with the lowest reconstruction error. We propose an efficient iterative selection sampling approach and show that in the noiseless case the original signal is exactly recovered from the set of sel… ▽ More We study the problem of sampling and reconstruction of bandlimited graph signals where the objective is to select a node subset of prescribed cardinality that ensures interpolation of the original signal with the lowest reconstruction error. We propose an efficient iterative selection sampling approach and show that in the noiseless case the original signal is exactly recovered from the set of selected nodes. In the case of noisy measurements, a bound on the reconstruction error of the proposed algorithm is established. We further address the support identification of the bandlimited signal with unknown support and show that under a pragmatic sufficient condition, the proposed framework requires minimal number of samples to perfectly identify the support. The efficacy of the proposed methods are illustrated through numerical simulations on synthetic and real-world graphs. △ Less

Submitted 18 July, 2018; originally announced July 2018.

arXiv:1709.08823 [pdf, ps, other]

A Randomized Greedy Algorithm for Near-Optimal Sensor Scheduling in Large-Scale Sensor Networks

Authors: Abolfazl Hashemi, Mahsa Ghasemi, Haris Vikalo, Ufuk Topcu

Abstract: We study the problem of scheduling sensors in a resource-constrained linear dynamical system, where the objective is to select a small subset of sensors from a large network to perform the state estimation task. We formulate this problem as the maximization of a monotone set function under a matroid constraint. We propose a randomized greedy algorithm that is significantly faster than state-of-the… ▽ More We study the problem of scheduling sensors in a resource-constrained linear dynamical system, where the objective is to select a small subset of sensors from a large network to perform the state estimation task. We formulate this problem as the maximization of a monotone set function under a matroid constraint. We propose a randomized greedy algorithm that is significantly faster than state-of-the-art methods. By introducing the notion of curvature which quantifies how close a function is to being submodular, we analyze the performance of the proposed algorithm and find a bound on the expected mean square error (MSE) of the estimator that uses the selected sensors in terms of the optimal MSE. Moreover, we derive a probabilistic bound on the curvature for the scenario where{\color{black}{ the measurements are i.i.d. random vectors with bounded $\ell_2$ norm.}} Simulation results demonstrate efficacy of the randomized greedy algorithm in a comparison with greedy and semidefinite programming relaxation methods. △ Less

Submitted 3 April, 2018; v1 submitted 26 September, 2017; originally announced September 2017.

Showing 1–14 of 14 results for author: Vikalo, H