Search | arXiv e-print repository

EDN: A Novel Edge-Dependent Noise Model for Graph Data

Authors: Pintu Kumar, Nandyala Hemachandra

Abstract: An important structural feature of a graph is its set of edges, as it captures the relationships among the nodes (the graph's topology). Existing node label noise models like Symmetric Label Noise (SLN) and Class Conditional Noise (CCN) disregard this important node relationship in graph data; and the Edge-Dependent Noise (EDN) model addresses this limitation. EDN posits that in real-world scenari… ▽ More An important structural feature of a graph is its set of edges, as it captures the relationships among the nodes (the graph's topology). Existing node label noise models like Symmetric Label Noise (SLN) and Class Conditional Noise (CCN) disregard this important node relationship in graph data; and the Edge-Dependent Noise (EDN) model addresses this limitation. EDN posits that in real-world scenarios, label noise may be influenced by the connections between nodes. We explore three variants of EDN. A crucial notion that relates nodes and edges in a graph is the degree of a node; we show that in all three variants, the probability of a node's label corruption is dependent on its degree. Additionally, we compare the dependence of these probabilities on node degree across different variants. We performed experiments on popular graph datasets using 5 different GNN architectures and 8 noise robust algorithms for graph data. The results demonstrate that 2 variants of EDN lead to greater performance degradation in both Graph Neural Networks (GNNs) and existing noise-robust algorithms, as compared to traditional node label noise models. We statistically verify this by posing a suitable hypothesis-testing problem. This emphasizes the importance of incorporating EDN when evaluating noise robust algorithms for graphs, to enhance the reliability of graph-based learning in noisy environments. △ Less

Submitted 12 June, 2025; originally announced June 2025.

arXiv:2506.00244 [pdf, ps, other]

DeGLIF for Label Noise Robust Node Classification using GNNs

Authors: Pintu Kumar, Nandyala Hemachandra

Abstract: Noisy labelled datasets are generally inexpensive compared to clean labelled datasets, and the same is true for graph data. In this paper, we propose a denoising technique DeGLIF: Denoising Graph Data using Leave-One-Out Influence Function. DeGLIF uses a small set of clean data and the leave-one-out influence function to make label noise robust node-level prediction on graph data. Leave-one-out in… ▽ More Noisy labelled datasets are generally inexpensive compared to clean labelled datasets, and the same is true for graph data. In this paper, we propose a denoising technique DeGLIF: Denoising Graph Data using Leave-One-Out Influence Function. DeGLIF uses a small set of clean data and the leave-one-out influence function to make label noise robust node-level prediction on graph data. Leave-one-out influence function approximates the change in the model parameters if a training point is removed from the training dataset. Recent advances propose a way to calculate the leave-one-out influence function for Graph Neural Networks (GNNs). We extend that recent work to estimate the change in validation loss, if a training node is removed from the training dataset. We use this estimate and a new theoretically motivated relabelling function to denoise the training dataset. We propose two DeGLIF variants to identify noisy nodes. Both these variants do not require any information about the noise model or the noise level in the dataset; DeGLIF also does not estimate these quantities. For one of these variants, we prove that the noisy points detected can indeed increase risk. We carry out detailed computational experiments on different datasets to show the effectiveness of DeGLIF. It achieves better accuracy than other baseline algorithms △ Less

Submitted 30 May, 2025; originally announced June 2025.

arXiv:2406.05526 [pdf, ps, other]

Optimal Storage Design: An $L^{\infty}$ infused Inventory Control

Authors: Madhu Dhiman, Veeraruna Kavitha, Nandyala Hemachandra

Abstract: Inventory and queueing systems are often designed by controlling weighted combination of some time-averaged performance metrics (like cumulative holding, shortage, server-utilization or congestion costs); but real-world constraints, like fixed storage or limited waiting space, require attention to peak levels reached during the operating period. This work formulates such control problems, which… ▽ More Inventory and queueing systems are often designed by controlling weighted combination of some time-averaged performance metrics (like cumulative holding, shortage, server-utilization or congestion costs); but real-world constraints, like fixed storage or limited waiting space, require attention to peak levels reached during the operating period. This work formulates such control problems, which are any arbitrary weighted combination of some integral cost terms and an L-infinity(peak-level) term. The resultant control problem does not fall into standard control framework, nor does it have standard solution in terms of some partial differential equations. We introduce an auxiliary state variable to track the instantaneous peak-levels, enabling reformulation into the classical framework. We then propose a smooth approximation to handle the resultant discontinuities, and show the existence of unique value function that uniquely solves the corresponding Hamilton-Jacobi-Bellman equation. We apply this framework to two key applications to obtain an optimal design that includes controlling the peak-levels. Surprisingly, the numerical results show peak inventory can be minimized with negligible revenue loss (under 6%); without considering peak-control, the peak levels were significantly higher. The peak-optimal policies for queueing-system can reduce peak-congestion by up to 27%, however, at the expense of higher cumulative-congestion costs. Thus, for inventory-control, the performance of the average-terms did not degrade much, while the same is not true for queueing-system. Hence, one would require a judiciously chosen weighted design of all the costs involved including the peak-levels for any application and such a design can now be derived numerically using the proposed framework. △ Less

Submitted 21 June, 2025; v1 submitted 8 June, 2024; originally announced June 2024.

arXiv:2310.20280 [pdf, other]

AutoMixer for Improved Multivariate Time-Series Forecasting on Business and IT Observability Data

Authors: Santosh Palaskar, Vijay Ekambaram, Arindam Jati, Neelamadhav Gantayat, Avirup Saha, Seema Nagar, Nam H. Nguyen, Pankaj Dayama, Renuka Sindhgatta, Prateeti Mohapatra, Harshit Kumar, Jayant Kalagnanam, Nandyala Hemachandra, Narayan Rangaraj

Abstract: The efficiency of business processes relies on business key performance indicators (Biz-KPIs), that can be negatively impacted by IT failures. Business and IT Observability (BizITObs) data fuses both Biz-KPIs and IT event channels together as multivariate time series data. Forecasting Biz-KPIs in advance can enhance efficiency and revenue through proactive corrective measures. However, BizITObs da… ▽ More The efficiency of business processes relies on business key performance indicators (Biz-KPIs), that can be negatively impacted by IT failures. Business and IT Observability (BizITObs) data fuses both Biz-KPIs and IT event channels together as multivariate time series data. Forecasting Biz-KPIs in advance can enhance efficiency and revenue through proactive corrective measures. However, BizITObs data generally exhibit both useful and noisy inter-channel interactions between Biz-KPIs and IT events that need to be effectively decoupled. This leads to suboptimal forecasting performance when existing multivariate forecasting models are employed. To address this, we introduce AutoMixer, a time-series Foundation Model (FM) approach, grounded on the novel technique of channel-compressed pretrain and finetune workflows. AutoMixer leverages an AutoEncoder for channel-compressed pretraining and integrates it with the advanced TSMixer model for multivariate time series forecasting. This fusion greatly enhances the potency of TSMixer for accurate forecasts and also generalizes well across several downstream tasks. Through detailed experiments and dashboard analytics, we show AutoMixer's capability to consistently improve the Biz-KPI's forecasting accuracy (by 11-15\%) which directly translates to actionable business insights. △ Less

Submitted 2 November, 2023; v1 submitted 31 October, 2023; originally announced October 2023.

Comments: Accepted in the Thirty-Sixth Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-24)

arXiv:2303.07834 [pdf, other]

Finite-Horizon Constrained MDPs With Both Additive And Multiplicative Utilities

Authors: Uday Kumar M, Sanjay P Bhat, Veeraruna Kavitha, Nandyala Hemachandra

Abstract: This paper considers the problem of finding a solution to the finite horizon constrained Markov decision processes (CMDP) where the objective as well as constraints are sum of additive and multiplicative utilities. Towards solving this, we construct another CMDP, with only additive utilities under a restricted set of policies, whose optimal value is equal to that of the original CMDP. Furthermore,… ▽ More This paper considers the problem of finding a solution to the finite horizon constrained Markov decision processes (CMDP) where the objective as well as constraints are sum of additive and multiplicative utilities. Towards solving this, we construct another CMDP, with only additive utilities under a restricted set of policies, whose optimal value is equal to that of the original CMDP. Furthermore, we provide a finite dimensional bilinear program (BLP) whose value equals the CMDP value and whose solution provides the optimal policy. We also suggest an algorithm to solve this BLP. △ Less

Submitted 15 March, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

arXiv:2301.10993 [pdf, other]

Multi-Agent Congestion Cost Minimization With Linear Function Approximations

Authors: Prashant Trivedi, Nandyala Hemachandra

Abstract: This work considers multiple agents traversing a network from a source node to the goal node. The cost to an agent for traveling a link has a private as well as a congestion component. The agent's objective is to find a path to the goal node with minimum overall cost in a decentralized way. We model this as a fully decentralized multi-agent reinforcement learning problem and propose a novel multi-… ▽ More This work considers multiple agents traversing a network from a source node to the goal node. The cost to an agent for traveling a link has a private as well as a congestion component. The agent's objective is to find a path to the goal node with minimum overall cost in a decentralized way. We model this as a fully decentralized multi-agent reinforcement learning problem and propose a novel multi-agent congestion cost minimization (MACCM) algorithm. Our MACCM algorithm uses linear function approximations of transition probabilities and the global cost function. In the absence of a central controller and to preserve privacy, agents communicate the cost function parameters to their neighbors via a time-varying communication network. Moreover, each agent maintains its estimate of the global state-action value, which is updated via a multi-agent extended value iteration (MAEVI) sub-routine. We show that our MACCM algorithm achieves a sub-linear regret. The proof requires the convergence of cost function parameters, the MAEVI algorithm, and analysis of the regret bounds induced by the MAEVI triggering condition for each agent. We implement our algorithm on a two node network with multiple links to validate it. We first identify the optimal policy, the optimal number of agents going to the goal node in each period. We observe that the average regret is close to zero for 2 and 3 agents. The optimal policy captures the trade-off between the minimum cost of staying at a node and the congestion cost of going to the goal node. Our work is a generalization of learning the stochastic shortest path problem. △ Less

Submitted 23 February, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

Comments: Accepted at International Conference on Artificial Intelligence and Statistics (AISTATS) 2023

arXiv:2209.14963 [pdf, ps, other]

Approximate Solutions To Constrained Risk-Sensitive Markov Decision Processes

Authors: Uday Kumar M, Sanjay P Bhat, Veeraruna Kavitha, Nandyala Hemachandra

Abstract: This paper considers the problem of finding near-optimal Markovian randomized (MR) policies for finite-state-action, infinite-horizon, constrained risk-sensitive Markov decision processes (CRSMDPs). Constraints are in the form of standard expected discounted cost functions as well as expected risk-sensitive discounted cost functions over finite and infinite horizons. The main contribution is to sh… ▽ More This paper considers the problem of finding near-optimal Markovian randomized (MR) policies for finite-state-action, infinite-horizon, constrained risk-sensitive Markov decision processes (CRSMDPs). Constraints are in the form of standard expected discounted cost functions as well as expected risk-sensitive discounted cost functions over finite and infinite horizons. The main contribution is to show that the problem possesses a solution if it is feasible, and to provide two methods for finding an approximate solution in the form of an ultimately stationary (US) MR policy. The latter is achieved through two approximating finite-horizon CRSMDPs which are constructed from the original CRSMDP by time-truncating the original objective and constraint cost functions, and suitably perturbing the constraint upper bounds. The first approximation gives a US policy which is $ε$-optimal and feasible for the original problem, while the second approximation gives a near-optimal US policy whose violation of the original constraints is bounded above by a specified $ε$. A key step in the proofs is an appropriate choice of a metric that makes the set of infinite-horizon MR policies and the feasible regions of the three CRSMDPs compact, and the objective and constraint functions continuous. A linear-programming-based formulation for solving the approximating finite-horizon CRSMDPs is also given. △ Less

Submitted 29 September, 2022; originally announced September 2022.

Comments: 38 pages

arXiv:2207.01988 [pdf, other]

Unsupervised Crowdsourcing with Accuracy and Cost Guarantees

Authors: Yashvardhan Didwania, Jayakrishnan Nair, N. Hemachandra

Abstract: We consider the problem of cost-optimal utilization of a crowdsourcing platform for binary, unsupervised classification of a collection of items, given a prescribed error threshold. Workers on the crowdsourcing platform are assumed to be divided into multiple classes, based on their skill, experience, and/or past performance. We model each worker class via an unknown confusion matrix, and a (known… ▽ More We consider the problem of cost-optimal utilization of a crowdsourcing platform for binary, unsupervised classification of a collection of items, given a prescribed error threshold. Workers on the crowdsourcing platform are assumed to be divided into multiple classes, based on their skill, experience, and/or past performance. We model each worker class via an unknown confusion matrix, and a (known) price to be paid per label prediction. For this setting, we propose algorithms for acquiring label predictions from workers, and for inferring the true labels of items. We prove that if the number of (unlabeled) items available is large enough, our algorithms satisfy the prescribed error thresholds, incurring a cost that is near-optimal. Finally, we validate our algorithms, and some heuristics inspired by them, through an extensive case study. △ Less

Submitted 5 July, 2022; originally announced July 2022.

Comments: To be presented at WiOpt 2022

arXiv:2112.13514 [pdf, other]

Anomaly Detection using Capsule Networks for High-dimensional Datasets

Authors: Inderjeet Singh, Nandyala Hemachandra

Abstract: Anomaly detection is an essential problem in machine learning. Application areas include network security, health care, fraud detection, etc., involving high-dimensional datasets. A typical anomaly detection system always faces the class-imbalance problem in the form of a vast difference in the sample sizes of different classes. They usually have class overlap problems. This study used a capsule n… ▽ More Anomaly detection is an essential problem in machine learning. Application areas include network security, health care, fraud detection, etc., involving high-dimensional datasets. A typical anomaly detection system always faces the class-imbalance problem in the form of a vast difference in the sample sizes of different classes. They usually have class overlap problems. This study used a capsule network for the anomaly detection task. To the best of our knowledge, this is the first instance where a capsule network is analyzed for the anomaly detection task in a high-dimensional complex data setting. We also handle the related novelty and outlier detection problems. The architecture of the capsule network was suitably modified for a binary classification task. Capsule networks offer a good option for detecting anomalies due to the effect of viewpoint invariance captured in its predictions and viewpoint equivariance captured in internal capsule architecture. We used six-layered under-complete autoencoder architecture with second and third layers containing capsules. The capsules were trained using the dynamic routing algorithm. We created $10$-imbalanced datasets from the original MNIST dataset and compared the performance of the capsule network with $5$ baseline models. Our leading test set measures are F1-score for minority class and area under the ROC curve. We found that the capsule network outperformed every other baseline model on the anomaly detection task by using only ten epochs for training and without using any other data level and algorithm level approach. Thus, we conclude that capsule networks are excellent in modeling complex high-dimensional imbalanced datasets for the anomaly detection task. △ Less

Submitted 27 December, 2021; v1 submitted 27 December, 2021; originally announced December 2021.

Comments: Submitted to ACML2019

arXiv:2109.07738 [pdf, other]

Noise Robust Core-Stable Coalitions of Hedonic Games

Authors: Prashant Trivedi, Nandyala Hemachandra

Abstract: We consider the coalition formation games with an additional component, `noisy preferences'. Moreover, such noisy preferences are available only for a sample of coalitions. We propose a multiplicative noise model and obtain the prediction probability, defined as the probability that the estimated PAC core-stable partition of the noisy game is also PAC core-stable for the unknown noise-free game. T… ▽ More We consider the coalition formation games with an additional component, `noisy preferences'. Moreover, such noisy preferences are available only for a sample of coalitions. We propose a multiplicative noise model and obtain the prediction probability, defined as the probability that the estimated PAC core-stable partition of the noisy game is also PAC core-stable for the unknown noise-free game. This prediction probability depends on the probability of a combinatorial construct called an `agreement event'. We explicitly obtain the agreement probability for $n$ agent noisy game with l\geq 2 support noise distribution. For a user-given satisfaction value on this probability, we identify the noise regimes for which an estimated partition is noise robust; that is, it is PAC core-stable in both noisy and noise-free games. We obtain similar robustness results when the estimated partition is not PAC core-stable. These noise regimes correspond to the level sets of the agreement probability function and are non-convex sets. Moreover, an important fact is that the prediction probability can be high even if high noise values occur with a high probability. Further, for a class of top-responsive hedonic games, we obtain the bounds on the extra noisy samples required to get noise robustness with a user-given satisfaction value. We completely solve the noise robustness problem of a $2$ agent hedonic game. In particular, we obtain the prediction probability function for l=2 and l=3 noise support cases. For l=2, the prediction probability is convex in noise probability, but the noise robust regime is non-convex. Its minimum value, called the safety value, is 0.62; so, below 0.62, the noise robust regime is the entire probability simplex. However, for l \geq 3, the prediction probability is non-convex; so, the safety value is the global minima of a non-convex function and is computationally hard. △ Less

Submitted 24 January, 2023; v1 submitted 16 September, 2021; originally announced September 2021.

Comments: Accepted in Asian Conference on Machine Learning 2022. To appear in Proceedings of Machine Learning Research 189, 2022

arXiv:2109.01654 [pdf, other]

Multi-agent Natural Actor-critic Reinforcement Learning Algorithms

Authors: Prashant Trivedi, Nandyala Hemachandra

Abstract: Multi-agent actor-critic algorithms are an important part of the Reinforcement Learning paradigm. We propose three fully decentralized multi-agent natural actor-critic (MAN) algorithms in this work. The objective is to collectively find a joint policy that maximizes the average long-term return of these agents. In the absence of a central controller and to preserve privacy, agents communicate some… ▽ More Multi-agent actor-critic algorithms are an important part of the Reinforcement Learning paradigm. We propose three fully decentralized multi-agent natural actor-critic (MAN) algorithms in this work. The objective is to collectively find a joint policy that maximizes the average long-term return of these agents. In the absence of a central controller and to preserve privacy, agents communicate some information to their neighbors via a time-varying communication network. We prove convergence of all the 3 MAN algorithms to a globally asymptotically stable set of the ODE corresponding to actor update; these use linear function approximations. We show that the Kullback-Leibler divergence between policies of successive iterates is proportional to the objective function's gradient. We observe that the minimum singular value of the Fisher information matrix is well within the reciprocal of the policy parameter dimension. Using this, we theoretically show that the optimal value of the deterministic variant of the MAN algorithm at each iterate dominates that of the standard gradient-based multi-agent actor-critic (MAAC) algorithm. To our knowledge, it is a first such result in multi-agent reinforcement learning (MARL). To illustrate the usefulness of our proposed algorithms, we implement them on a bi-lane traffic network to reduce the average network congestion. We observe an almost 25\% reduction in the average congestion in 2 MAN algorithms; the average congestion in another MAN algorithm is on par with the MAAC algorithm. We also consider a generic $15$ agent MARL; the performance of the MAN algorithms is again as good as the MAAC algorithm. △ Less

Submitted 2 April, 2022; v1 submitted 3 September, 2021; originally announced September 2021.

Comments: A very high-level summary of our revision is: In Section 3.5, we theoretically prove that the objective function value from the deterministic variant of MAN algorithms dominates that of the MAAC algorithm under some minimal conditions. It relies on the Lemma 2 of our paper: the minimum singular value of the Fisher information matrix is well within the reciprocal of the policy parameter dimension

arXiv:2010.09577 [pdf, other]

GANs for learning from very high class conditional noisy labels

Authors: Sandhya Tripathi, N Hemachandra

Abstract: We use Generative Adversarial Networks (GANs) to design a class conditional label noise (CCN) robust scheme for binary classification. It first generates a set of correctly labelled data points from noisy labelled data and 0.1% or 1% clean labels such that the generated and true (clean) labelled data distributions are close; generated labelled data is used to learn a good classifier. The mode coll… ▽ More We use Generative Adversarial Networks (GANs) to design a class conditional label noise (CCN) robust scheme for binary classification. It first generates a set of correctly labelled data points from noisy labelled data and 0.1% or 1% clean labels such that the generated and true (clean) labelled data distributions are close; generated labelled data is used to learn a good classifier. The mode collapse problem while generating correct feature-label pairs and the problem of skewed feature-label dimension ratio ($\sim$ 784:1) are avoided by using Wasserstein GAN and a simple data representation change. Another WGAN with information-theoretic flavour on top of the new representation is also proposed. The major advantage of both schemes is their significant improvement over the existing ones in presence of very high CCN rates, without either estimating or cross-validating over the noise rates. We proved that KL divergence between clean and noisy distribution increases w.r.t. noise rates in symmetric label noise model; can be extended to high CCN rates. This implies that our schemes perform well due to the adversarial nature of GANs. Further, use of generative approach (learning clean joint distribution) while handling noise enables our schemes to perform better than discriminative approaches like GLC, LDMI and GCE; even when the classes are highly imbalanced. Using Friedman F test and Nemenyi posthoc test, we showed that on high dimensional binary class synthetic, MNIST and Fashion MNIST datasets, our schemes outperform the existing methods and demonstrate consistent performance across noise rates. △ Less

Submitted 19 October, 2020; originally announced October 2020.

arXiv:2009.07554 [pdf, other]

Thompson Sampling for Unsupervised Sequential Selection

Authors: Arun Verma, Manjesh K. Hanawal, Nandyala Hemachandra

Abstract: Thompson Sampling has generated significant interest due to its better empirical performance than upper confidence bound based algorithms. In this paper, we study Thompson Sampling based algorithm for Unsupervised Sequential Selection (USS) problem. The USS problem is a variant of the stochastic multi-armed bandits problem, where the loss of an arm can not be inferred from the observed feedback. I… ▽ More Thompson Sampling has generated significant interest due to its better empirical performance than upper confidence bound based algorithms. In this paper, we study Thompson Sampling based algorithm for Unsupervised Sequential Selection (USS) problem. The USS problem is a variant of the stochastic multi-armed bandits problem, where the loss of an arm can not be inferred from the observed feedback. In the USS setup, arms are associated with fixed costs and are ordered, forming a cascade. In each round, the learner selects an arm and observes the feedback from arms up to the selected arm. The learner's goal is to find the arm that minimizes the expected total loss. The total loss is the sum of the cost incurred for selecting the arm and the stochastic loss associated with the selected arm. The problem is challenging because, without knowing the mean loss, one cannot compute the total loss for the selected arm. Clearly, learning is feasible only if the optimal arm can be inferred from the problem structure. As shown in the prior work, learning is possible when the problem instance satisfies the so-called `Weak Dominance' (WD) property. Under WD, we show that our Thompson Sampling based algorithm for the USS problem achieves near optimal regret and has better numerical performance than existing algorithms. △ Less

Submitted 16 September, 2020; originally announced September 2020.

Comments: Accepted to ACML 2020

arXiv:2008.07330 [pdf, other]

Optimal Posteriors for Chi-squared Divergence based PAC-Bayesian Bounds and Comparison with KL-divergence based Optimal Posteriors and Cross-Validation Procedure

Authors: Puja Sahu, Nandyala Hemachandra

Abstract: We investigate optimal posteriors for recently introduced \cite{begin2016pac} chi-squared divergence based PAC-Bayesian bounds in terms of nature of their distribution, scalability of computations, and test set performance. For a finite classifier set, we deduce bounds for three distance functions: KL-divergence, linear and squared distances. Optimal posterior weights are proportional to deviation… ▽ More We investigate optimal posteriors for recently introduced \cite{begin2016pac} chi-squared divergence based PAC-Bayesian bounds in terms of nature of their distribution, scalability of computations, and test set performance. For a finite classifier set, we deduce bounds for three distance functions: KL-divergence, linear and squared distances. Optimal posterior weights are proportional to deviations of empirical risks, usually with subset support. For uniform prior, it is sufficient to search among posteriors on classifier subsets ordered by these risks. We show the bound minimization for linear distance as a convex program and obtain a closed-form expression for its optimal posterior. Whereas that for squared distance is a quasi-convex program under a specific condition, and the one for KL-divergence is non-convex optimization (a difference of convex functions). To compute such optimal posteriors, we derive fast converging fixed point (FP) equations. We apply these approaches to a finite set of SVM regularization parameter values to yield stochastic SVMs with tight bounds. We perform a comprehensive performance comparison between our optimal posteriors and known KL-divergence based posteriors on a variety of UCI datasets with varying ranges and variances in risk values, etc. Chi-squared divergence based posteriors have weaker bounds and worse test errors, hinting at an underlying regularization by KL-divergence based posteriors. Our study highlights the impact of divergence function on the performance of PAC-Bayesian classifiers. We compare our stochastic classifiers with cross-validation based deterministic classifier. The latter has better test errors, but ours is more sample robust, has quantifiable generalization guarantees, and is computationally much faster. △ Less

Submitted 13 August, 2020; originally announced August 2020.

Comments: arXiv admin note: text overlap with arXiv:1912.06803

arXiv:2001.03956 [pdf, other]

Interpretable feature subset selection: A Shapley value based approach

Authors: Sandhya Tripathi, N. Hemachandra, Prashant Trivedi

Abstract: For feature selection and related problems, we introduce the notion of classification game, a cooperative game, with features as players and hinge loss based characteristic function and relate a feature's contribution to Shapley value based error apportioning (SVEA) of total training error. Our major contribution is ($\star$) to show that for any dataset the threshold 0 on SVEA value identifies fe… ▽ More For feature selection and related problems, we introduce the notion of classification game, a cooperative game, with features as players and hinge loss based characteristic function and relate a feature's contribution to Shapley value based error apportioning (SVEA) of total training error. Our major contribution is ($\star$) to show that for any dataset the threshold 0 on SVEA value identifies feature subset whose joint interactions for label prediction is significant or those features that span a subspace where the data is predominantly lying. In addition, our scheme ($\star$) identifies the features on which Bayes classifier doesn't depend but any surrogate loss function based finite sample classifier does; this contributes to the excess $0$-$1$ risk of such a classifier, ($\star$) estimates unknown true hinge risk of a feature, and ($\star$) relate the stability property of an allocation and negative valued SVEA by designing the analogue of core of classification game. Due to Shapley value's computationally expensive nature, we build on a known Monte Carlo based approximation algorithm that computes characteristic function (Linear Programs) only when needed. We address the potential sample bias problem in feature selection by providing interval estimates for SVEA values obtained from multiple sub-samples. We illustrate all the above aspects on various synthetic and real datasets and show that our scheme achieves better results than existing recursive feature elimination technique and ReliefF in most cases. Our theoretically grounded classification game in terms of well defined characteristic function offers interpretability (which we formalize in terms of final task) and explainability of our framework, including identification of important features. △ Less

Submitted 25 April, 2021; v1 submitted 12 January, 2020; originally announced January 2020.

Comments: A shorter version of this work appeared in a special session titled Explainable AI at IEEE BigData'20 conference. More experiments and a new notion of interpretable FSS introduced in this version. Earlier plots for sample bias robustness are corrected and updated

arXiv:2001.00626 [pdf, other]

Unsupervised Online Feature Selection for Cost-Sensitive Medical Diagnosis

Authors: Arun Verma, Manjesh K. Hanawal, Nandyala Hemachandra

Abstract: In medical diagnosis, physicians predict the state of a patient by checking measurements (features) obtained from a sequence of tests, e.g., blood test, urine test, followed by invasive tests. As tests are often costly, one would like to obtain only those features (tests) that can establish the presence or absence of the state conclusively. Another aspect of medical diagnosis is that we are often… ▽ More In medical diagnosis, physicians predict the state of a patient by checking measurements (features) obtained from a sequence of tests, e.g., blood test, urine test, followed by invasive tests. As tests are often costly, one would like to obtain only those features (tests) that can establish the presence or absence of the state conclusively. Another aspect of medical diagnosis is that we are often faced with unsupervised prediction tasks as the true state of the patients may not be known. Motivated by such medical diagnosis problems, we consider a {\it Cost-Sensitive Medical Diagnosis} (CSMD) problem, where the true state of patients is unknown. We formulate the CSMD problem as a feature selection problem where each test gives a feature that can be used in a prediction model. Our objective is to learn strategies for selecting the features that give the best trade-off between accuracy and costs. We exploit the `Weak Dominance' property of problem to develop online algorithms that identify a set of features which provides an `optimal' trade-off between cost and accuracy of prediction without requiring to know the true state of the medical condition. Our empirical results validate the performance of our algorithms on problem instances generated from real-world datasets. △ Less

Submitted 25 December, 2019; originally announced January 2020.

Comments: Accepted to NetHealth Workshop at COMSNETS 2020

arXiv:1912.06803 [pdf, other]

Optimal PAC-Bayesian Posteriors for Stochastic Classifiers and their use for Choice of SVM Regularization Parameter

Authors: Puja Sahu, Nandyala Hemachandra

Abstract: PAC-Bayesian set up involves a stochastic classifier characterized by a posterior distribution on a classifier set, offers a high probability bound on its averaged true risk and is robust to the training sample used. For a given posterior, this bound captures the trade off between averaged empirical risk and KL-divergence based model complexity term. Our goal is to identify an optimal posterior wi… ▽ More PAC-Bayesian set up involves a stochastic classifier characterized by a posterior distribution on a classifier set, offers a high probability bound on its averaged true risk and is robust to the training sample used. For a given posterior, this bound captures the trade off between averaged empirical risk and KL-divergence based model complexity term. Our goal is to identify an optimal posterior with the least PAC-Bayesian bound. We consider a finite classifier set and 5 distance functions: KL-divergence, its Pinsker's and a sixth degree polynomial approximations; linear and squared distances. Linear distance based model results in a convex optimization problem. We obtain closed form expression for its optimal posterior. For uniform prior, this posterior has full support with weights negative-exponentially proportional to number of misclassifications. Squared distance and Pinsker's approximation bounds are possibly quasi-convex and are observed to have single local minimum. We derive fixed point equations (FPEs) using partial KKT system with strict positivity constraints. This obviates the combinatorial search for subset support of the optimal posterior. For uniform prior, exponential search on a full-dimensional simplex can be limited to an ordered subset of classifiers with increasing empirical risk values. These FPEs converge rapidly to a stationary point, even for a large classifier set when a solver fails. We apply these approaches to SVMs generated using a finite set of SVM regularization parameter values on 9 UCI datasets. These posteriors yield stochastic SVM classifiers with tight bounds. KL-divergence based bound is the tightest, but is computationally expensive due to non-convexity and multiple calls to a root finding algorithm. Optimal posteriors for all 5 distance functions have lowest 10% test error values on most datasets, with linear distance being the easiest to obtain. △ Less

Submitted 14 December, 2019; originally announced December 2019.

Comments: 56 pages, 6 Figures, ACML 2019 conference paper with supplementary material

Journal ref: Proceedings of The Eleventh Asian Conference on Machine Learning, in PMLR 101:268-283 (2019)

arXiv:1911.07875 [pdf, other]

Attribute noise robust binary classification

Authors: Aditya Petety, Sandhya Tripathi, N Hemachandra

Abstract: We consider the problem of learning linear classifiers when both features and labels are binary. In addition, the features are noisy, i.e., they could be flipped with an unknown probability. In Sy-De attribute noise model, where all features could be noisy together with same probability, we show that $0$-$1$ loss ($l_{0-1}$) need not be robust but a popular surrogate, squared loss ($l_{sq}$) is. I… ▽ More We consider the problem of learning linear classifiers when both features and labels are binary. In addition, the features are noisy, i.e., they could be flipped with an unknown probability. In Sy-De attribute noise model, where all features could be noisy together with same probability, we show that $0$-$1$ loss ($l_{0-1}$) need not be robust but a popular surrogate, squared loss ($l_{sq}$) is. In Asy-In attribute noise model, we prove that $l_{0-1}$ is robust for any distribution over 2 dimensional feature space. However, due to computational intractability of $l_{0-1}$, we resort to $l_{sq}$ and observe that it need not be Asy-In noise robust. Our empirical results support Sy-De robustness of squared loss for low to moderate noise rates. △ Less

Submitted 18 November, 2019; originally announced November 2019.

Comments: Accepted for Student Abstract presentation at AAAI2020

arXiv:1901.02271 [pdf, ps, other]

Cost Sensitive Learning in the Presence of Symmetric Label Noise

Authors: Sandhya Tripathi, N. Hemachandra

Abstract: In binary classification framework, we are interested in making cost sensitive label predictions in the presence of uniform/symmetric label noise. We first observe that $0$-$1$ Bayes classifiers are not (uniform) noise robust in cost sensitive setting. To circumvent this impossibility result, we present two schemes; unlike the existing methods, our schemes do not require noise rate. The first one… ▽ More In binary classification framework, we are interested in making cost sensitive label predictions in the presence of uniform/symmetric label noise. We first observe that $0$-$1$ Bayes classifiers are not (uniform) noise robust in cost sensitive setting. To circumvent this impossibility result, we present two schemes; unlike the existing methods, our schemes do not require noise rate. The first one uses $α$-weighted $γ$-uneven margin squared loss function, $l_{α, usq}$, which can handle cost sensitivity arising due to domain requirement (using user given $α$) or class imbalance (by tuning $γ$) or both. However, we observe that $l_{α, usq}$ Bayes classifiers are also not cost sensitive and noise robust. We show that regularized ERM of this loss function over the class of linear classifiers yields a cost sensitive uniform noise robust classifier as a solution of a system of linear equations. We also provide a performance bound for this classifier. The second scheme that we propose is a re-sampling based scheme that exploits the special structure of the uniform noise models and uses in-class probability $η$ estimates. Our computational experiments on some UCI datasets with class imbalance show that classifiers of our two schemes are on par with the existing methods and in fact better in some cases w.r.t. Accuracy and Arithmetic Mean, without using/tuning noise rate. We also consider other cost sensitive performance measures viz., F measure and Weighted Cost for evaluation. As our re-sampling scheme requires estimates of $η$, we provide a detailed comparative study of various $η$ estimation methods on synthetic datasets, w.r.t. half a dozen evaluation criterion. Also, we provide understanding on the interpretation of cost parameters $α$ and $γ$ using different synthetic data experiments. △ Less

Submitted 10 January, 2020; v1 submitted 8 January, 2019; originally announced January 2019.

Comments: Version 4 updates: Added comparison to noise rate $ρ=0$ in Table 1,2, 9 and 10. Also, corrected the alpha values in the Table 1,2, 9 and 10

arXiv:1810.08021 [pdf, other]

On a Conjecture for Dynamic Priority Queues and Nash Equilibrium for Quality of Service Sensitive Markets

Authors: Manu K. Gupta, N. Hemachandra

Abstract: Many economic transactions, including those of online markets, have a time lag between the start and end times of transactions. Customers need to wait for completion of their transaction (order fulfillment) and hence are also interested in their waiting time as a Quality of Service (QoS) attribute. So, they factor this QoS in the demand they offer to the firm (service-provider) and some customers… ▽ More Many economic transactions, including those of online markets, have a time lag between the start and end times of transactions. Customers need to wait for completion of their transaction (order fulfillment) and hence are also interested in their waiting time as a Quality of Service (QoS) attribute. So, they factor this QoS in the demand they offer to the firm (service-provider) and some customers (user-set) would be willing to pay for shorter waiting times. On the other hand, such waiting times depend on the demand user-set offers to the service-provider. We model the above economic-QoS strategic interaction between service-provider and user-set under a fairly generic scheduling framework as a non-cooperative constrained game. We use an existing joint pricing and scheduling model. An optimal solution to this joint pricing and scheduling problem was guaranteed by a finite step algorithm subject to a conjecture. We first settle this conjecture based on queuing and optimization arguments and discuss its implications on the above game. We show that a continuum of Nash equilibria (NE) exists and it can be computed easily using constrained best response dynamics. Revenue maximal NE is identified by above finite step algorithm. We illustrate how both players can benefit at such revenue maximal NE by identifying suitable operational decisions, i.e., by choosing an appropriate game along the theme of pricing and revenue management. △ Less

Submitted 18 October, 2018; originally announced October 2018.

arXiv:1804.03564 [pdf, other]

Some parametrized dynamic priority policies for 2-class M/G/1 queues: completeness and applications

Authors: Manu K. Gupta, N. Hemachandra, J. Venkateswaran

Abstract: Completeness of a dynamic priority scheduling scheme is of fundamental importance for the optimal control of queues in areas as diverse as computer communications, communication networks, supply chains and manufacturing systems. Our first main contribution is to identify the mean waiting time completeness as a unifying aspect for four different dynamic priority scheduling schemes by proving their… ▽ More Completeness of a dynamic priority scheduling scheme is of fundamental importance for the optimal control of queues in areas as diverse as computer communications, communication networks, supply chains and manufacturing systems. Our first main contribution is to identify the mean waiting time completeness as a unifying aspect for four different dynamic priority scheduling schemes by proving their completeness and equivalence in 2-class M/G/1 queue. These dynamic priority schemes are earliest due date based, head of line priority jump, relative priority, and probabilistic priority. In our second main contribution, we characterize the optimal scheduling policies for the case studies in different domains by exploiting the completeness of above dynamic priority schemes. The major theme of second main contribution is resource allocation/optimal control in revenue management problems for contemporary systems such as cloud computing, high-performance computing, etc., where congestion is inherent. Using completeness and theoretically tractable nature of relative priority policy, we study the impact of approximation in a fairly generic data network utility framework. We introduce the notion of min-max fairness in multi-class queues and show that a simple global FCFS policy is min-max fair. Next, we re-derive the celebrated $c/ρ$ rule for 2-class M/G/1 queues by an elegant argument and also simplify a complex joint pricing and scheduling problem for a wider class of scheduling policies. △ Less

Submitted 10 April, 2018; originally announced April 2018.

arXiv:1605.00977 [pdf, ps, other]

Blackwell-Nash Equilibrium for Discrete and Continuous Time Stochastic Games

Authors: Vikas Vikram Singh, N. Hemachandra

Abstract: We consider both discrete and continuous time finite state-action stochastic games. In discrete time stochastic games, it is known that a stationary Blackwell-Nash equilibrium (BNE) exists for a single controller additive reward (SC-AR) stochastic game which is a special case of a general stochastic game. We show that, in general, the additive reward condition is needed for the existence of a BNE.… ▽ More We consider both discrete and continuous time finite state-action stochastic games. In discrete time stochastic games, it is known that a stationary Blackwell-Nash equilibrium (BNE) exists for a single controller additive reward (SC-AR) stochastic game which is a special case of a general stochastic game. We show that, in general, the additive reward condition is needed for the existence of a BNE. We give an example of a single controller stochastic game which does not satisfy additive reward condition. We show that this example does not have a stationary BNE. For a general discrete time discounted stochastic game we give two different sets of conditions and show that a stationary Nash equilibrium that satisfies any set of conditions is a BNE. One of these sets of conditions weakens a set of conditions available in the literature. For continuous time stochastic games, we give an example that does not have a stationary BNE. In fact, this example is a single controller continuous time stochastic game. Then, we introduce a continuous time SC-AR stochastic game. We show that there always exists a stationary deterministic BNE for continuous time SC-AR stochastic game. For a general continuous time discounted stochastic game we give two different sets of conditions and show that a Nash equilibrium that satisfies any set of conditions is a BNE. △ Less

Submitted 3 May, 2016; originally announced May 2016.

MSC Class: 91A05; 91A10; 91A15; 90C40

arXiv:1206.1672 [pdf, ps, other]

A mathematical programming based characterization of Nash equilibria of some constrained stochastic games

Authors: Vikas Vikram Singh, N. Hemachandra

Abstract: We consider two classes of constrained finite state-action stochastic games. First, we consider a two player nonzero sum single controller constrained stochastic game with both average and discounted cost criterion. We consider the same type of constraints as in [1], i.e., player 1 has subscription based constraints and player 2, who controls the transition probabilities, has realization based con… ▽ More We consider two classes of constrained finite state-action stochastic games. First, we consider a two player nonzero sum single controller constrained stochastic game with both average and discounted cost criterion. We consider the same type of constraints as in [1], i.e., player 1 has subscription based constraints and player 2, who controls the transition probabilities, has realization based constraints which can also depend on the strategies of player 1. Next, we consider a N -player nonzero sum constrained stochastic game with independent state processes where each player has average cost criterion as discussed in [2]. We show that the stationary Nash equilibria of both classes of constrained games, which exists under strong Slater and irreducibility conditions [3], [2], has one to one correspondence with global minima of certain mathematical programs. In the single controller game if the constraints of player 2 do not depend on the strategies of the player 1, then the mathematical program reduces to the non-convex quadratic program. In two player independent state processes stochastic game if the constraints of a player do not depend on the strategies of another player, then the mathematical program reduces to a non-convex quadratic program. Computational algorithms for finding global minima of non-convex quadratic program exist [4], [5] and hence, one can compute Nash equilibria of these constrained stochastic games. Our results generalize some existing results for zero sum games [1], [6], [7]. △ Less

Submitted 8 June, 2012; originally announced June 2012.

MSC Class: 91A10; 91A15; 90C05; 90C20; 90C26

arXiv:math/0212006 [pdf, ps, other]

Bounds for covariances and variances of truncated random variables

Authors: N. Hemachandra, V. Cheriyan

Abstract: We show that a lower bound for covariance of $\min(X_1,X_2)$ and $\max(X_1,X_2)$ is $\cov{X_1}{X_2}$ and an upper bound for variance of \\ $\min(X_2,\max(X,X_1))$ is $\var{X} + \var{X_1} +\var{X_2}$ generalizing previous results. We also characterize the cases where these bounds are sharp. We show that a lower bound for covariance of $\min(X_1,X_2)$ and $\max(X_1,X_2)$ is $\cov{X_1}{X_2}$ and an upper bound for variance of \\ $\min(X_2,\max(X,X_1))$ is $\var{X} + \var{X_1} +\var{X_2}$ generalizing previous results. We also characterize the cases where these bounds are sharp. △ Less

Submitted 1 December, 2002; originally announced December 2002.

Comments: 7 pages. Revised during October 2002

Report number: 02_2002 MSC Class: 60

Showing 1–24 of 24 results for author: Hemachandra, N