Search | arXiv e-print repository

Iterative Regularization with k-support Norm: An Important Complement to Sparse Recovery

Authors: William de Vazelhes, Bhaskar Mukhoty, Xiao-Tong Yuan, Bin Gu

Abstract: Sparse recovery is ubiquitous in machine learning and signal processing. Due to the NP-hard nature of sparse recovery, existing methods are known to suffer either from restrictive (or even unknown) applicability conditions, or high computational cost. Recently, iterative regularization methods have emerged as a promising fast approach because they can achieve sparse recovery in one pass through ea… ▽ More Sparse recovery is ubiquitous in machine learning and signal processing. Due to the NP-hard nature of sparse recovery, existing methods are known to suffer either from restrictive (or even unknown) applicability conditions, or high computational cost. Recently, iterative regularization methods have emerged as a promising fast approach because they can achieve sparse recovery in one pass through early stopping, rather than the tedious grid-search used in the traditional methods. However, most of those iterative methods are based on the $\ell_1$ norm which requires restrictive applicability conditions and could fail in many cases. Therefore, achieving sparse recovery with iterative regularization methods under a wider range of conditions has yet to be further explored. To address this issue, we propose a novel iterative regularization algorithm, IRKSN, based on the $k$-support norm regularizer rather than the $\ell_1$ norm. We provide conditions for sparse recovery with IRKSN, and compare them with traditional conditions for recovery with $\ell_1$ norm regularizers. Additionally, we give an early stopping bound on the model error of IRKSN with explicit constants, achieving the standard linear rate for sparse recovery. Finally, we illustrate the applicability of our algorithm on several experiments, including a support recovery experiment with a correlated design matrix. △ Less

Submitted 19 March, 2024; v1 submitted 19 December, 2023; originally announced January 2024.

Comments: Accepted at AAAI 2024. Code at https://github.com/wdevazelhes/IRKSN_AAAI2024

arXiv:2310.06483 [pdf, other]

Variance Reduced Online Gradient Descent for Kernelized Pairwise Learning with Limited Memory

Authors: Hilal AlQuabeh, Bhaskar Mukhoty, Bin Gu

Abstract: Pairwise learning is essential in machine learning, especially for problems involving loss functions defined on pairs of training examples. Online gradient descent (OGD) algorithms have been proposed to handle online pairwise learning, where data arrives sequentially. However, the pairwise nature of the problem makes scalability challenging, as the gradient computation for a new sample involves al… ▽ More Pairwise learning is essential in machine learning, especially for problems involving loss functions defined on pairs of training examples. Online gradient descent (OGD) algorithms have been proposed to handle online pairwise learning, where data arrives sequentially. However, the pairwise nature of the problem makes scalability challenging, as the gradient computation for a new sample involves all past samples. Recent advancements in OGD algorithms have aimed to reduce the complexity of calculating online gradients, achieving complexities less than $O(T)$ and even as low as $O(1)$. However, these approaches are primarily limited to linear models and have induced variance. In this study, we propose a limited memory OGD algorithm that extends to kernel online pairwise learning while improving the sublinear regret. Specifically, we establish a clear connection between the variance of online gradients and the regret, and construct online gradients using the most recent stratified samples with a limited buffer of size of $s$ representing all past data, which have a complexity of $O(sT)$ and employs $O(\sqrt{T}\log{T})$ random Fourier features for kernel approximation. Importantly, our theoretical results demonstrate that the variance-reduced online gradients lead to an improved sublinear regret bound. The experiments on real-world datasets demonstrate the superiority of our algorithm over both kernelized and linear online pairwise learning algorithms. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: Accepted in ACML2023

arXiv:2302.00910 [pdf, other]

Energy Efficient Training of SNN using Local Zeroth Order Method

Authors: Bhaskar Mukhoty, Velibor Bojkovic, William de Vazelhes, Giulia De Masi, Huan Xiong, Bin Gu

Abstract: Spiking neural networks are becoming increasingly popular for their low energy requirement in real-world tasks with accuracy comparable to the traditional ANNs. SNN training algorithms face the loss of gradient information and non-differentiability due to the Heaviside function in minimizing the model loss over model parameters. To circumvent the problem surrogate method uses a differentiable appr… ▽ More Spiking neural networks are becoming increasingly popular for their low energy requirement in real-world tasks with accuracy comparable to the traditional ANNs. SNN training algorithms face the loss of gradient information and non-differentiability due to the Heaviside function in minimizing the model loss over model parameters. To circumvent the problem surrogate method uses a differentiable approximation of the Heaviside in the backward pass, while the forward pass uses the Heaviside as the spiking function. We propose to use the zeroth order technique at the neuron level to resolve this dichotomy and use it within the automatic differentiation tool. As a result, we establish a theoretical connection between the proposed local zeroth-order technique and the existing surrogate methods and vice-versa. The proposed method naturally lends itself to energy-efficient training of SNNs on GPUs. Experimental results with neuromorphic datasets show that such implementation requires less than 1 percent neurons to be active in the backward pass, resulting in a 100x speed-up in the backward computation time. Our method offers better generalization compared to the state-of-the-art energy-efficient technique while maintaining similar efficiency. △ Less

Submitted 5 February, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

arXiv:2212.05430 [pdf, other]

Corruption-tolerant Algorithms for Generalized Linear Models

Authors: Bhaskar P Mukhoty, Debojyoti Dey, Purushottam Kar

Abstract: This paper presents SVAM (Sequential Variance-Altered MLE), a unified framework for learning generalized linear models under adversarial label corruption in training data. SVAM extends to tasks such as least squares regression, logistic regression, and gamma regression, whereas many existing works on learning with label corruptions focus only on least squares regression. SVAM is based on a novel v… ▽ More This paper presents SVAM (Sequential Variance-Altered MLE), a unified framework for learning generalized linear models under adversarial label corruption in training data. SVAM extends to tasks such as least squares regression, logistic regression, and gamma regression, whereas many existing works on learning with label corruptions focus only on least squares regression. SVAM is based on a novel variance reduction technique that may be of independent interest and works by iteratively solving weighted MLEs over variance-altered versions of the GLM objective. SVAM offers provable model recovery guarantees superior to the state-of-the-art for robust regression even when a constant fraction of training labels are adversarially corrupted. SVAM also empirically outperforms several existing problem-specific techniques for robust regression and classification. Code for SVAM is available at https://github.com/purushottamkar/svam/ △ Less

Submitted 11 December, 2022; originally announced December 2022.

Comments: 46 pages, 5 figures, to appear in the 31st AAAI Conference on Artificial Intelligence (AAAI), 2023

arXiv:2111.03932 [pdf, other]

AGGLIO: Global Optimization for Locally Convex Functions

Authors: Debojyoti Dey, Bhaskar Mukhoty, Purushottam Kar

Abstract: This paper presents AGGLIO (Accelerated Graduated Generalized LInear-model Optimization), a stage-wise, graduated optimization technique that offers global convergence guarantees for non-convex optimization problems whose objectives offer only local convexity and may fail to be even quasi-convex at a global scale. In particular, this includes learning problems that utilize popular activation funct… ▽ More This paper presents AGGLIO (Accelerated Graduated Generalized LInear-model Optimization), a stage-wise, graduated optimization technique that offers global convergence guarantees for non-convex optimization problems whose objectives offer only local convexity and may fail to be even quasi-convex at a global scale. In particular, this includes learning problems that utilize popular activation functions such as sigmoid, softplus and SiLU that yield non-convex training objectives. AGGLIO can be readily implemented using point as well as mini-batch SGD updates and offers provable convergence to the global optimum in general conditions. In experiments, AGGLIO outperformed several recently proposed optimization techniques for non-convex and locally convex objectives in terms of convergence rate as well as convergent accuracy. AGGLIO relies on a graduation technique for generalized linear models, as well as a novel proof strategy, both of which may be of independent interest. △ Less

Submitted 6 November, 2021; originally announced November 2021.

Comments: 33 pages, 7 figures, to appear at 9th ACM IKDD Conference on Data Science (CODS) 2022. Code for AGGLIO is available at https://github.com/purushottamkar/agglio/

arXiv:2009.02930 [pdf, other]

Unsupervised Learning Based Robust Multivariate Intrusion Detection System for Cyber-Physical Systems using Low Rank Matrix

Authors: Aneet K. Dutta, Bhaskar Mukhoty, Sandeep K. Shukla

Abstract: Regular and uninterrupted operation of critical infrastructures such as power, transport, communication etc. are essential for proper functioning of a country. Cyber-attacks causing disruption in critical infrastructure service in the past, are considered as a significant threat. With the advancement in technology and the progress of the critical infrastructures towards IP based communication, cyb… ▽ More Regular and uninterrupted operation of critical infrastructures such as power, transport, communication etc. are essential for proper functioning of a country. Cyber-attacks causing disruption in critical infrastructure service in the past, are considered as a significant threat. With the advancement in technology and the progress of the critical infrastructures towards IP based communication, cyber-physical systems are lucrative targets of the attackers. In this paper, we propose a robust multivariate intrusion detection system called RAD for detecting attacks in the cyber-physical systems in O(d) space and time complexity, where d is the number parameters in the system state vector. The proposed Intrusion Detection System(IDS) is developed in an unsupervised learning setting without using labelled data denoting attacks. It allows a fraction of the training data to be corrupted by outliers or under attack, by subscribing to robust training procedure. The proposed IDS outperforms existing anomaly detection techniques in several real-world datasets and attack scenarios. △ Less

Submitted 7 September, 2020; originally announced September 2020.

Comments: 9pages, 14 figures

arXiv:2006.14211 [pdf, other]

Globally-convergent Iteratively Reweighted Least Squares for Robust Regression Problems

Authors: Bhaskar Mukhoty, Govind Gopakumar, Prateek Jain, Purushottam Kar

Abstract: We provide the first global model recovery results for the IRLS (iteratively reweighted least squares) heuristic for robust regression problems. IRLS is known to offer excellent performance, despite bad initializations and data corruption, for several parameter estimation problems. Existing analyses of IRLS frequently require careful initialization, thus offering only local convergence guarantees.… ▽ More We provide the first global model recovery results for the IRLS (iteratively reweighted least squares) heuristic for robust regression problems. IRLS is known to offer excellent performance, despite bad initializations and data corruption, for several parameter estimation problems. Existing analyses of IRLS frequently require careful initialization, thus offering only local convergence guarantees. We remedy this by proposing augmentations to the basic IRLS routine that not only offer guaranteed global recovery, but in practice also outperform state-of-the-art algorithms for robust regression. Our routines are more immune to hyperparameter misspecification in basic regression tasks, as well as applied tasks such as linear-armed bandit problems. Our theoretical analyses rely on a novel extension of the notions of strong convexity and smoothness to weighted strong convexity and smoothness, and establishing that sub-Gaussian designs offer bounded weighted condition numbers. These notions may be useful in analyzing other algorithms as well. △ Less

Submitted 25 June, 2020; originally announced June 2020.

Comments: 30 pages, 5 figures, appeared as a publication in the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 2019

Journal ref: Proceedings of Machine Learning Research (PMLR) 89:313-322, 2019

arXiv:2005.11257 [pdf, other]

Epidemiologically and Socio-economically Optimal Policies via Bayesian Optimization

Authors: Amit Chandak, Debojyoti Dey, Bhaskar Mukhoty, Purushottam Kar

Abstract: Mass public quarantining, colloquially known as a lock-down, is a non-pharmaceutical intervention to check spread of disease. This paper presents ESOP (Epidemiologically and Socio-economically Optimal Policies), a novel application of active machine learning techniques using Bayesian optimization, that interacts with an epidemiological model to arrive at lock-down schedules that optimally balance… ▽ More Mass public quarantining, colloquially known as a lock-down, is a non-pharmaceutical intervention to check spread of disease. This paper presents ESOP (Epidemiologically and Socio-economically Optimal Policies), a novel application of active machine learning techniques using Bayesian optimization, that interacts with an epidemiological model to arrive at lock-down schedules that optimally balance public health benefits and socio-economic downsides of reduced economic activity during lock-down periods. The utility of ESOP is demonstrated using case studies with VIPER (Virus-Individual-Policy-EnviRonment), a stochastic agent-based simulator that this paper also proposes. However, ESOP is flexible enough to interact with arbitrary epidemiological simulators in a black-box manner, and produce schedules that involve multiple phases of lock-downs. △ Less

Submitted 14 June, 2020; v1 submitted 22 May, 2020; originally announced May 2020.

Comments: Keywords: COVID-19, Optimal Policy, Lock-down, Epidemiology, Bayesian Optimization Code available at https://github.com/purushottamkar/esop

MSC Class: 92D30 (Primary) 90C26; 90C56; 60G15 (Secondary)

arXiv:1904.13081 [pdf, other]

Sequence to sequence deep learning models for solar irradiation forecasting

Authors: Bhaskar Pratim Mukhoty, Vikas Maurya, Sandeep Kumar Shukla

Abstract: The energy output a photo voltaic(PV) panel is a function of solar irradiation and weather parameters like temperature and wind speed etc. A general measure for solar irradiation called Global Horizontal Irradiance (GHI), customarily reported in Watt/meter$^2$, is a generic indicator for this intermittent energy resource. An accurate prediction of GHI is necessary for reliable grid integration of… ▽ More The energy output a photo voltaic(PV) panel is a function of solar irradiation and weather parameters like temperature and wind speed etc. A general measure for solar irradiation called Global Horizontal Irradiance (GHI), customarily reported in Watt/meter$^2$, is a generic indicator for this intermittent energy resource. An accurate prediction of GHI is necessary for reliable grid integration of the renewable as well as for power market trading. While some machine learning techniques are well introduced along with the traditional time-series forecasting techniques, deep-learning techniques remains less explored for the task at hand. In this paper we give deep learning models suitable for sequence to sequence prediction of GHI. The deep learning models are reported for short-term forecasting $\{1-24\}$ hour along with the state-of-the art techniques like Gradient Boosted Regression Trees(GBRT) and Feed Forward Neural Networks(FFNN). We have checked that spatio-temporal features like wind direction, wind speed and GHI of neighboring location improves the prediction accuracy of the deep learning models significantly. Among the various sequence-to-sequence encoder-decoder models LSTM performed superior, handling short-comings of the state-of-the-art techniques. △ Less

Submitted 30 April, 2019; originally announced April 2019.

arXiv:1711.07221 [pdf, other]

Model Extraction Warning in MLaaS Paradigm

Authors: Manish Kesarwani, Bhaskar Mukhoty, Vijay Arya, Sameep Mehta

Abstract: Cloud vendors are increasingly offering machine learning services as part of their platform and services portfolios. These services enable the deployment of machine learning models on the cloud that are offered on a pay-per-query basis to application developers and end users. However recent work has shown that the hosted models are susceptible to extraction attacks. Adversaries may launch queries… ▽ More Cloud vendors are increasingly offering machine learning services as part of their platform and services portfolios. These services enable the deployment of machine learning models on the cloud that are offered on a pay-per-query basis to application developers and end users. However recent work has shown that the hosted models are susceptible to extraction attacks. Adversaries may launch queries to steal the model and compromise future query payments or privacy of the training data. In this work, we present a cloud-based extraction monitor that can quantify the extraction status of models by observing the query and response streams of both individual and colluding adversarial users. We present a novel technique that uses information gain to measure the model learning rate by users with increasing number of queries. Additionally, we present an alternate technique that maintains intelligent query summaries to measure the learning rate relative to the coverage of the input feature space in the presence of collusion. Both these approaches have low computational overhead and can easily be offered as services to model owners to warn them of possible extraction attacks from adversaries. We present performance results for these approaches for decision tree models deployed on BigML MLaaS platform, using open source datasets and different adversarial attack strategies. △ Less

Submitted 20 November, 2017; originally announced November 2017.

arXiv:1507.05409 [pdf, other]

A Parameter-free Affinity Based Clustering

Authors: Bhaskar Mukhoty, Ruchir Gupta, Y. N. Singh

Abstract: Several methods have been proposed to estimate the number of clusters in a dataset; the basic ideal behind all of them has been to study an index that measures inter-cluster separation and intra-cluster cohesion over a range of cluster numbers and report the number which gives an optimum value of the index. In this paper we propose a simple, parameter free approach that is like human cognition to… ▽ More Several methods have been proposed to estimate the number of clusters in a dataset; the basic ideal behind all of them has been to study an index that measures inter-cluster separation and intra-cluster cohesion over a range of cluster numbers and report the number which gives an optimum value of the index. In this paper we propose a simple, parameter free approach that is like human cognition to form clusters, where closely lying points are easily identified to form a cluster and total number of clusters are revealed. To identify closely lying points, affinity of two points is defined as a function of distance and a threshold affinity is identified, above which two points in a dataset are likely to be in the same cluster. Well separated clusters are identified even in the presence of outliers, whereas for not so well separated dataset, final number of clusters are estimated and the detected clusters are merged to produce the final clusters. Experiments performed with several large dimensional synthetic and real datasets show good results with robustness to noise and density variation within dataset. △ Less

Submitted 11 January, 2016; v1 submitted 20 July, 2015; originally announced July 2015.

Showing 1–11 of 11 results for author: Mukhoty, B