Skip to main content

Showing 1–16 of 16 results for author: Ramaswamy, H G

.
  1. arXiv:2505.17777  [pdf, ps, other

    cs.LG

    Optimizing Shortfall Risk Metric for Learning Regression Models

    Authors: Harish G. Ramaswamy, L. A. Prashanth

    Abstract: We consider the problem of estimating and optimizing utility-based shortfall risk (UBSR) of a loss, say $(Y - \hat Y)^2$, in the context of a regression problem. Empirical risk minimization with a UBSR objective is challenging since UBSR is a non-linear function of the underlying distribution. We first derive a concentration bound for UBSR estimation using independent and identically distributed (… ▽ More

    Submitted 11 June, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  2. arXiv:2411.04569  [pdf, other

    cs.LG cs.AI

    Impact of Label Noise on Learning Complex Features

    Authors: Rahul Vashisht, P. Krishna Kumar, Harsha Vardhan Govind, Harish G. Ramaswamy

    Abstract: Neural networks trained with stochastic gradient descent exhibit an inductive bias towards simpler decision boundaries, typically converging to a narrow family of functions, and often fail to capture more complex features. This phenomenon raises concerns about the capacity of deep models to adequately learn and represent real-world datasets. Traditional approaches such as explicit regularization,… ▽ More

    Submitted 7 November, 2024; originally announced November 2024.

    Comments: Accepted at Workshop on Scientific Methods for Understanding Deep Learning, NeurIPS 2024

  3. arXiv:2408.09266  [pdf, other

    cs.LG

    Graph Classification with GNNs: Optimisation, Representation and Inductive Bias

    Authors: P. Krishna Kumar a, Harish G. Ramaswamy

    Abstract: Theoretical studies on the representation power of GNNs have been centered around understanding the equivalence of GNNs, using WL-Tests for detecting graph isomorphism. In this paper, we argue that such equivalence ignores the accompanying optimization issues and does not provide a holistic view of the GNN learning process. We illustrate these gaps between representation and optimization with exam… ▽ More

    Submitted 23 August, 2024; v1 submitted 17 August, 2024; originally announced August 2024.

  4. arXiv:2404.04312  [pdf, other

    cs.LG cs.AI cs.NE

    Half-Space Feature Learning in Neural Networks

    Authors: Mahesh Lorik Yadav, Harish Guruprasad Ramaswamy, Chandrashekar Lakshminarayanan

    Abstract: There currently exist two extreme viewpoints for neural network feature learning -- (i) Neural networks simply implement a kernel method (a la NTK) and hence no features are learned (ii) Neural networks can represent (and hence learn) intricate hierarchical features suitable for the data. We argue in this paper neither interpretation is likely to be correct based on a novel viewpoint. Neural netwo… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  5. On the Learning Dynamics of Attention Networks

    Authors: Rahul Vashisht, Harish G. Ramaswamy

    Abstract: Attention models are typically learned by optimizing one of three standard loss functions that are variously called -- soft attention, hard attention, and latent variable marginal likelihood (LVML) attention. All three paradigms are motivated by the same goal of finding two models -- a `focus' model that `selects' the right \textit{segment} of the input and a `classification' model that processes… ▽ More

    Submitted 12 October, 2023; v1 submitted 25 July, 2023; originally announced July 2023.

    Comments: Proceedings at ECAI-2023 IOS Press

  6. arXiv:2212.14776  [pdf, ps, other

    cs.LG

    On the Interpretability of Attention Networks

    Authors: Lakshmi Narayan Pandey, Rahul Vashisht, Harish G. Ramaswamy

    Abstract: Attention mechanisms form a core component of several successful deep learning architectures, and are based on one key idea: ''The output depends only on a small (but unknown) segment of the input.'' In several practical applications like image captioning and language translation, this is mostly true. In trained models with an attention mechanism, the outputs of an intermediate module that encodes… ▽ More

    Submitted 14 May, 2023; v1 submitted 30 December, 2022; originally announced December 2022.

    Comments: ACML 2022,PMLR, Volume 189, https://proceedings.mlr.press/v189/pandey23a/pandey23a.pdf

    Journal ref: Proceedings of The 14th Asian Conference on Machine, 832--847, 2023, Volume:189; PMLR

  7. arXiv:2210.09695  [pdf, other

    stat.ML cs.LG

    Consistent Multiclass Algorithms for Complex Metrics and Constraints

    Authors: Harikrishna Narasimhan, Harish G. Ramaswamy, Shiv Kumar Tavker, Drona Khurana, Praneeth Netrapalli, Shivani Agarwal

    Abstract: We present consistent algorithms for multiclass learning with complex performance metrics and constraints, where the objective and constraints are defined by arbitrary functions of the confusion matrix. This setting includes many common performance metrics such as the multiclass G-mean and micro F1-measure, and constraints such as those on the classifier's precision and recall and more recent meas… ▽ More

    Submitted 18 October, 2022; v1 submitted 18 October, 2022; originally announced October 2022.

  8. arXiv:2111.13075  [pdf, other

    cs.LG

    Predicting the success of Gradient Descent for a particular Dataset-Architecture-Initialization (DAI)

    Authors: Umangi Jain, Harish G. Ramaswamy

    Abstract: Despite their massive success, training successful deep neural networks still largely relies on experimentally choosing an architecture, hyper-parameters, initialization, and training mechanism. In this work, we focus on determining the success of standard gradient descent method for training deep neural networks on a specified dataset, architecture, and initialization (DAI) combination. Through e… ▽ More

    Submitted 25 November, 2021; originally announced November 2021.

    Comments: 10 pages, 9 figures

  9. arXiv:2012.08854  [pdf, ps, other

    cs.LG stat.ML

    Using noise resilience for ranking generalization of deep neural networks

    Authors: Depen Morwani, Rahul Vashisht, Harish G. Ramaswamy

    Abstract: Recent papers have shown that sufficiently overparameterized neural networks can perfectly fit even random labels. Thus, it is crucial to understand the underlying reason behind the generalization performance of a network on real-world data. In this work, we propose several measures to predict the generalization error of a network given the training data and its parameters. Using one of these meas… ▽ More

    Submitted 16 December, 2020; originally announced December 2020.

    ACM Class: I.5.1

  10. arXiv:2010.12909  [pdf, other

    cs.LG stat.ML

    Inductive Bias of Gradient Descent for Weight Normalized Smooth Homogeneous Neural Nets

    Authors: Depen Morwani, Harish G. Ramaswamy

    Abstract: We analyze the inductive bias of gradient descent for weight normalized smooth homogeneous neural nets, when trained on exponential or cross-entropy loss. We analyse both standard weight normalization (SWN) and exponential weight normalization (EWN), and show that the gradient flow path with EWN is equivalent to gradient flow on standard networks with an adaptive learning rate. We extend these res… ▽ More

    Submitted 31 January, 2023; v1 submitted 24 October, 2020; originally announced October 2020.

    Comments: Accepted to ALT 2022

    ACM Class: I.5.1; I.2.6

  11. arXiv:2009.07801  [pdf, other

    stat.ML cs.LG

    Convex Calibrated Surrogates for the Multi-Label F-Measure

    Authors: Mingyuan Zhang, Harish G. Ramaswamy, Shivani Agarwal

    Abstract: The F-measure is a widely used performance measure for multi-label classification, where multiple labels can be active in an instance simultaneously (e.g. in image tagging, multiple tags can be active in any image). In particular, the F-measure explicitly balances recall (fraction of active labels predicted to be active) and precision (fraction of labels predicted to be active that are actually so… ▽ More

    Submitted 16 September, 2020; originally announced September 2020.

    Comments: Accepted to ICML 2020

  12. arXiv:1810.11975  [pdf, other

    cs.LG cs.CL stat.ML

    On Controllable Sparse Alternatives to Softmax

    Authors: Anirban Laha, Saneem A. Chemmengath, Priyanka Agrawal, Mitesh M. Khapra, Karthik Sankaranarayanan, Harish G. Ramaswamy

    Abstract: Converting an n-dimensional vector to a probability distribution over n objects is a commonly used component in many machine learning tasks like multiclass classification, multilabel classification, attention mechanisms etc. For this, several probability mapping functions have been proposed and employed in literature such as softmax, sum-normalization, spherical softmax, and sparsemax, but there i… ▽ More

    Submitted 30 October, 2018; v1 submitted 29 October, 2018; originally announced October 2018.

    Comments: To appear in NIPS 2018, Total 16 pages including appendix

  13. arXiv:1603.02501  [pdf, other

    cs.LG stat.ML

    Mixture Proportion Estimation via Kernel Embedding of Distributions

    Authors: Harish G. Ramaswamy, Clayton Scott, Ambuj Tewari

    Abstract: Mixture proportion estimation (MPE) is the problem of estimating the weight of a component distribution in a mixture, given samples from the mixture and component. This problem constitutes a key part in many "weakly supervised learning" problems like learning with positive and unlabelled samples, learning with label noise, anomaly detection and crowdsourcing. While there have been several methods… ▽ More

    Submitted 31 May, 2016; v1 submitted 8 March, 2016; originally announced March 2016.

  14. arXiv:1505.04137  [pdf, other

    cs.LG stat.ML

    Consistent Algorithms for Multiclass Classification with a Reject Option

    Authors: Harish G. Ramaswamy, Ambuj Tewari, Shivani Agarwal

    Abstract: We consider the problem of $n$-class classification ($n\geq 2$), where the classifier can choose to abstain from making predictions at a given cost, say, a factor $α$ of the cost of misclassification. Designing consistent algorithms for such $n$-class classification problems with a `reject option' is the main goal of this paper, thereby extending and generalizing previously known results for… ▽ More

    Submitted 15 May, 2015; originally announced May 2015.

  15. arXiv:1501.00287  [pdf, ps, other

    cs.LG stat.ML

    Consistent Classification Algorithms for Multi-class Non-Decomposable Performance Metrics

    Authors: Harish G. Ramaswamy, Harikrishna Narasimhan, Shivani Agarwal

    Abstract: We study consistency of learning algorithms for a multi-class performance metric that is a non-decomposable function of the confusion matrix of a classifier and cannot be expressed as a sum of losses on individual data points; examples of such performance metrics include the macro F-measure popular in information retrieval and the G-mean metric used in class-imbalanced problems. While there has be… ▽ More

    Submitted 1 January, 2015; originally announced January 2015.

  16. arXiv:1408.2764  [pdf, other

    cs.LG stat.ML

    Convex Calibration Dimension for Multiclass Loss Matrices

    Authors: Harish G. Ramaswamy, Shivani Agarwal

    Abstract: We study consistency properties of surrogate loss functions for general multiclass learning problems, defined by a general multiclass loss matrix. We extend the notion of classification calibration, which has been studied for binary and multiclass 0-1 classification problems (and for certain other specific learning problems), to the general multiclass setting, and derive necessary and sufficient c… ▽ More

    Submitted 23 August, 2015; v1 submitted 12 August, 2014; originally announced August 2014.

    Comments: Accepted to JMLR, pending editing