Skip to main content

Showing 1–34 of 34 results for author: Menon, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2504.11284  [pdf, ps, other

    cs.LG cs.AI cs.IR stat.ML

    Bipartite Ranking From Multiple Labels: On Loss Versus Label Aggregation

    Authors: Michal Lukasik, Lin Chen, Harikrishna Narasimhan, Aditya Krishna Menon, Wittawat Jitkrittum, Felix X. Yu, Sashank J. Reddi, Gang Fu, Mohammadhossein Bateni, Sanjiv Kumar

    Abstract: Bipartite ranking is a fundamental supervised learning problem, with the goal of learning a ranking over instances with maximal Area Under the ROC Curve (AUC) against a single binary target label. However, one may often observe multiple binary target labels, e.g., from distinct human annotators. How can one synthesize such labels into a single coherent ranking? In this work, we formally analyze tw… ▽ More

    Submitted 9 June, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

    Comments: Accepted by ICML 2025

  2. arXiv:2406.17968  [pdf, other

    cs.IR cs.AI cs.LG stat.ML

    Efficient Document Ranking with Learnable Late Interactions

    Authors: Ziwei Ji, Himanshu Jain, Andreas Veit, Sashank J. Reddi, Sadeep Jayasumana, Ankit Singh Rawat, Aditya Krishna Menon, Felix Yu, Sanjiv Kumar

    Abstract: Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval. To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings; usually, the former has higher quality while the latter benefits from lower latency. Recently, late-interaction models have been p… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2307.02764  [pdf, other

    cs.LG stat.ML

    When Does Confidence-Based Cascade Deferral Suffice?

    Authors: Wittawat Jitkrittum, Neha Gupta, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sanjiv Kumar

    Abstract: Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. One simple deferral rule employs the confidence of the current classifier, e.g., based on the maximum predicted softmax probability. Despite… ▽ More

    Submitted 23 January, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023

  4. arXiv:2302.01576  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    ResMem: Learn what you can and memorize the rest

    Authors: Zitong Yang, Michal Lukasik, Vaishnavh Nagarajan, Zonglin Li, Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: The impressive generalization performance of modern neural networks is attributed in part to their ability to implicitly memorize complex training patterns. Inspired by this, we explore a novel mechanism to improve model generalization via explicit memorization. Specifically, we propose the residual-memorization (ResMem) algorithm, a new method that augments an existing prediction model (e.g. a ne… ▽ More

    Submitted 20 October, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

  5. arXiv:2301.12923  [pdf, other

    cs.LG cs.AI stat.ML

    On student-teacher deviations in distillation: does it pay to disobey?

    Authors: Vaishnavh Nagarajan, Aditya Krishna Menon, Srinadh Bhojanapalli, Hossein Mobahi, Sanjiv Kumar

    Abstract: Knowledge distillation (KD) has been widely used to improve the test accuracy of a "student" network, by training it to mimic the soft probabilities of a trained "teacher" network. Yet, it has been shown in recent work that, despite being trained to fit the teacher's probabilities, the student may not only significantly deviate from the teacher probabilities, but may also outdo than the teacher in… ▽ More

    Submitted 18 March, 2024; v1 submitted 30 January, 2023; originally announced January 2023.

  6. arXiv:2204.13208  [pdf, other

    cs.LG stat.ML

    ELM: Embedding and Logit Margins for Long-Tail Learning

    Authors: Wittawat Jitkrittum, Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar

    Abstract: Long-tail learning is the problem of learning under skewed label distributions, which pose a challenge for standard learners. Several recent approaches for the problem have proposed enforcing a suitable margin in logit space. Such techniques are intuitive analogues of the guiding principle behind SVMs, and are equally applicable to linear models and neural models. However, when applied to neural m… ▽ More

    Submitted 27 April, 2022; originally announced April 2022.

    Comments: 24 pages

  7. arXiv:2111.08805  [pdf, ps, other

    stat.ML cs.LG q-fin.RM

    Online Estimation and Optimization of Utility-Based Shortfall Risk

    Authors: Vishwajit Hegde, Arvind S. Menon, L. A. Prashanth, Krishna Jagannathan

    Abstract: Utility-Based Shortfall Risk (UBSR) is a risk metric that is increasingly popular in financial applications, owing to certain desirable properties that it enjoys. We consider the problem of estimating UBSR in a recursive setting, where samples from the underlying loss distribution are available one-at-a-time. We cast the UBSR estimation problem as a root finding problem, and propose stochastic app… ▽ More

    Submitted 27 November, 2023; v1 submitted 16 November, 2021; originally announced November 2021.

  8. arXiv:2105.05736  [pdf, other

    cs.LG stat.ML

    Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces

    Authors: Ankit Singh Rawat, Aditya Krishna Menon, Wittawat Jitkrittum, Sadeep Jayasumana, Felix X. Yu, Sashank Reddi, Sanjiv Kumar

    Abstract: Negative sampling schemes enable efficient training given a large number of classes, by offering a means to approximate a computationally expensive loss function that takes all labels into account. In this paper, we present a new connection between these schemes and loss modification techniques for countering label imbalance. We show that different negative sampling schemes implicitly trade-off pe… ▽ More

    Submitted 12 May, 2021; originally announced May 2021.

    Comments: To appear in ICML 2021

  9. arXiv:2104.07932  [pdf, other

    cs.LG cs.CE stat.ML

    Interval-censored Hawkes processes

    Authors: Marian-Andrei Rizoiu, Alexander Soen, Shidi Li, Pio Calderon, Leanne Dong, Aditya Krishna Menon, Lexing Xie

    Abstract: Interval-censored data solely records the aggregated counts of events during specific time intervals - such as the number of patients admitted to the hospital or the volume of vehicles passing traffic loop detectors - and not the exact occurrence time of the events. It is currently not understood how to fit the Hawkes point processes to this kind of data. Its typical loss function (the point proce… ▽ More

    Submitted 25 November, 2022; v1 submitted 16 April, 2021; originally announced April 2021.

    Journal ref: Journal of Machine Learning Research, 23(338):1-84, 2022. https://jmlr.org/papers/v23/21-0917.html

  10. arXiv:2102.06849  [pdf, other

    cs.LG cs.AI stat.ML

    Distilling Double Descent

    Authors: Andrew Cotter, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sashank J. Reddi, Yichen Zhou

    Abstract: Distillation is the technique of training a "student" model based on examples that are labeled by a separate "teacher" model, which itself is trained on a labeled dataset. The most common explanations for why distillation "works" are predicated on the assumption that student is provided with \emph{soft} labels, \eg probabilities or confidences, from the teacher model. In this work, we show, that,… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

  11. arXiv:2007.12865  [pdf, other

    cs.LG cs.IR stat.ML

    Self-supervised Learning for Large-scale Item Recommendations

    Authors: Tiansheng Yao, Xinyang Yi, Derek Zhiyuan Cheng, Felix Yu, Ting Chen, Aditya Menon, Lichan Hong, Ed H. Chi, Steve Tjoa, Jieqi Kang, Evan Ettinger

    Abstract: Large scale recommender models find most relevant items from huge catalogs, and they play a critical role in modern search and recommendation systems. To model the input space with large-vocab categorical features, a typical recommender model learns a joint embedding space through neural networks for both queries and items from user feedback data. However, with millions to billions of items in the… ▽ More

    Submitted 24 February, 2021; v1 submitted 25 July, 2020; originally announced July 2020.

  12. arXiv:2007.07314  [pdf, other

    cs.LG stat.ML

    Long-tail learning via logit adjustment

    Authors: Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar

    Abstract: Real-world classification problems typically exhibit an imbalanced or long-tailed label distribution, wherein many labels are associated with only a few samples. This poses a challenge for generalisation on such labels, and also makes naïve learning biased towards dominant labels. In this paper, we present two simple modifications of standard softmax cross-entropy training to cope with these chall… ▽ More

    Submitted 9 July, 2021; v1 submitted 14 July, 2020; originally announced July 2020.

    Comments: Published as a conference paper in ICLR 2021

  13. arXiv:2005.10419  [pdf, other

    cs.LG stat.ML

    Why distillation helps: a statistical perspective

    Authors: Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Seungyeon Kim, Sanjiv Kumar

    Abstract: Knowledge distillation is a technique for improving the performance of a simple "student" model by replacing its one-hot training labels with a distribution over labels obtained from a complex "teacher" model. While this simple approach has proven widely effective, a basic question remains unresolved: why does distillation help? In this paper, we present a statistical perspective on distillation w… ▽ More

    Submitted 20 May, 2020; originally announced May 2020.

  14. arXiv:2004.11460  [pdf, other

    q-bio.QM cs.CY cs.LG stat.ML

    Development of a Machine Learning Model and Mobile Application to Aid in Predicting Dosage of Vitamin K Antagonists Among Indian Patients

    Authors: Amruthlal M, Devika S, Ameer Suhail P A, Aravind K Menon, Vignesh Krishnan, Alan Thomas, Manu Thomas, Sanjay G, Lakshmi Kanth L R, Jimmy Jose, Harikrishnan S

    Abstract: Patients who undergo mechanical heart valve replacements or have conditions like Atrial Fibrillation have to take Vitamin K Antagonists (VKA) drugs to prevent coagulation of blood. These drugs have narrow therapeutic range and need to be very closely monitored due to life threatening side effects. The dosage of VKA drug is determined and revised by a physician based on Prothrombin Time - Internati… ▽ More

    Submitted 19 April, 2020; originally announced April 2020.

  15. arXiv:2004.10915  [pdf, other

    cs.LG stat.ML

    Doubly-stochastic mining for heterogeneous retrieval

    Authors: Ankit Singh Rawat, Aditya Krishna Menon, Andreas Veit, Felix Yu, Sashank J. Reddi, Sanjiv Kumar

    Abstract: Modern retrieval problems are characterised by training sets with potentially billions of labels, and heterogeneous data distributions across subpopulations (e.g., users of a retrieval system may be from different countries), each of which poses a challenge. The first challenge concerns scalability: with a large number of labels, standard losses are difficult to optimise even on a single example.… ▽ More

    Submitted 22 April, 2020; originally announced April 2020.

  16. arXiv:2004.10342  [pdf, ps, other

    cs.LG stat.ML

    Federated Learning with Only Positive Labels

    Authors: Felix X. Yu, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: We consider learning a multi-class classification model in the federated setting, where each user has access to the positive data associated with only a single class. As a result, during each federated learning round, the users need to locally update the classifier without having access to the features and the model parameters for the negative classes. Thus, naively employing conventional decentra… ▽ More

    Submitted 21 April, 2020; originally announced April 2020.

  17. arXiv:2004.05465  [pdf, other

    cs.LG stat.ML

    Robust Large-Margin Learning in Hyperbolic Space

    Authors: Melanie Weber, Manzil Zaheer, Ankit Singh Rawat, Aditya Menon, Sanjiv Kumar

    Abstract: Recently, there has been a surge of interest in representation learning in hyperbolic spaces, driven by their ability to represent hierarchical data with significantly fewer dimensions than standard Euclidean spaces. However, the viability and benefits of hyperbolic spaces for downstream machine learning tasks have received less attention. In this paper, we present, to our knowledge, the first the… ▽ More

    Submitted 1 November, 2022; v1 submitted 11 April, 2020; originally announced April 2020.

    Comments: Revision corrects error in section 3.1

  18. arXiv:2003.02819  [pdf, other

    cs.LG stat.ML

    Does label smoothing mitigate label noise?

    Authors: Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: Label smoothing is commonly used in training deep learning models, wherein one-hot training labels are mixed with uniform label vectors. Empirically, smoothing has been shown to improve both predictive performance and model calibration. In this paper, we study whether label smoothing is also effective as a means of coping with label noise. While label smoothing apparently amplifies this problem --… ▽ More

    Submitted 5 March, 2020; originally announced March 2020.

  19. arXiv:2002.03555  [pdf, other

    cs.LG stat.ML

    Supervised Learning: No Loss No Cry

    Authors: Richard Nock, Aditya Krishna Menon

    Abstract: Supervised learning requires the specification of a loss function to minimise. While the theory of admissible losses from both a computational and statistical perspective is well-developed, these offer a panoply of different choices. In practice, this choice is typically made in an \emph{ad hoc} manner. In hopes of making this procedure more principled, the problem of \emph{learning the loss funct… ▽ More

    Submitted 10 February, 2020; originally announced February 2020.

    ACM Class: I.2.6

  20. arXiv:1909.09667  [pdf, other

    cs.LG stat.ML

    Online Hierarchical Clustering Approximations

    Authors: Aditya Krishna Menon, Anand Rajagopalan, Baris Sumengen, Gui Citovsky, Qin Cao, Sanjiv Kumar

    Abstract: Hierarchical clustering is a widely used approach for clustering datasets at multiple levels of granularity. Despite its popularity, existing algorithms such as hierarchical agglomerative clustering (HAC) are limited to the offline setting, and thus require the entire dataset to be available. This prohibits their use on large datasets commonly encountered in modern learning applications. In this p… ▽ More

    Submitted 20 September, 2019; originally announced September 2019.

    Comments: 17 pages, 3 figures

  21. arXiv:1901.10837  [pdf, other

    cs.LG cs.AI cs.CY stat.ML

    Noise-tolerant fair classification

    Authors: Alexandre Louis Lamy, Ziyuan Zhong, Aditya Krishna Menon, Nakul Verma

    Abstract: Fairness-aware learning involves designing algorithms that do not discriminate with respect to some sensitive feature (e.g., race or gender). Existing work on the problem operates under the assumption that the sensitive feature available in one's training sample is perfectly reliable. This assumption may be violated in many real-world cases: for example, respondents to a survey may choose to conce… ▽ More

    Submitted 9 January, 2020; v1 submitted 30 January, 2019; originally announced January 2019.

  22. arXiv:1901.08665  [pdf, other

    cs.LG stat.ML

    Fairness risk measures

    Authors: Robert C. Williamson, Aditya Krishna Menon

    Abstract: Ensuring that classifiers are non-discriminatory or fair with respect to a sensitive feature (e.g., race or gender) is a topical problem. Progress in this task requires fixing a definition of fairness, and there have been several proposals in this regard over the past few years. Several of these, however, assume either binary sensitive features (thus precluding categorical or real-valued sensitive… ▽ More

    Submitted 24 January, 2019; originally announced January 2019.

  23. arXiv:1901.06125  [pdf, other

    cs.IR cs.LG stat.ML

    Cold-start Playlist Recommendation with Multitask Learning

    Authors: Dawei Chen, Cheng Soon Ong, Aditya Krishna Menon

    Abstract: Playlist recommendation involves producing a set of songs that a user might enjoy. We investigate this problem in three cold-start scenarios: (i) cold playlists, where we recommend songs to form new personalised playlists for an existing user; (ii) cold users, where we recommend songs to form new playlists for a new user; and (iii) cold songs, where we recommend newly released songs to extend user… ▽ More

    Submitted 18 January, 2019; originally announced January 2019.

    Comments: 15 pages

    MSC Class: 68T05

  24. arXiv:1812.02171  [pdf, other

    cs.IR cs.LG stat.ML

    Comparative Document Summarisation via Classification

    Authors: Umanga Bista, Alexander Mathews, Minjeong Shin, Aditya Krishna Menon, Lexing Xie

    Abstract: This paper considers extractive summarisation in a comparative setting: given two or more document groups (e.g., separated by publication time), the goal is to select a small number of documents that are representative of each group, and also maximally distinguishable from other groups. We formulate a set of new objective functions for this problem that connect recent literature on document summar… ▽ More

    Submitted 2 January, 2020; v1 submitted 5 December, 2018; originally announced December 2018.

    Comments: Accepted for AAAI 2019

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. 2019

  25. arXiv:1810.04327  [pdf, other

    stat.ML cs.LG

    Complementary-Label Learning for Arbitrary Losses and Models

    Authors: Takashi Ishida, Gang Niu, Aditya Krishna Menon, Masashi Sugiyama

    Abstract: In contrast to the standard classification paradigm where the true class is given to each training pattern, complementary-label learning only uses training patterns each equipped with a complementary label, which only specifies one of the classes that the pattern does not belong to. The goal of this paper is to derive a novel framework of complementary-label learning with an unbiased estimator of… ▽ More

    Submitted 18 November, 2019; v1 submitted 9 October, 2018; originally announced October 2018.

    Comments: accepted to ICML 2019 (Added errata on Nov. 19, 2019)

  26. arXiv:1808.10585  [pdf, other

    stat.ML cs.LG

    On the Minimal Supervision for Training Any Binary Classifier from Only Unlabeled Data

    Authors: Nan Lu, Gang Niu, Aditya Krishna Menon, Masashi Sugiyama

    Abstract: Empirical risk minimization (ERM), with proper loss function and regularization, is the common practice of supervised classification. In this paper, we study training arbitrary (from linear to deep) binary classifier from only unlabeled (U) data by ERM. We prove that it is impossible to estimate the risk of an arbitrary binary classifier in an unbiased manner given a single set of U data, but it b… ▽ More

    Submitted 12 March, 2019; v1 submitted 30 August, 2018; originally announced August 2018.

  27. arXiv:1806.02977  [pdf, other

    cs.LG stat.ML

    Monge blunts Bayes: Hardness Results for Adversarial Training

    Authors: Zac Cranko, Aditya Krishna Menon, Richard Nock, Cheng Soon Ong, Zhan Shi, Christian Walder

    Abstract: The last few years have seen a staggering number of empirical studies of the robustness of neural networks in a model of adversarial perturbations of their inputs. Most rely on an adversary which carries out local modifications within prescribed balls. None however has so far questioned the broader picture: how to frame a resource-bounded adversary so that it can be severely detrimental to learnin… ▽ More

    Submitted 7 May, 2019; v1 submitted 8 June, 2018; originally announced June 2018.

    ACM Class: I.2.6

  28. arXiv:1802.06360  [pdf, other

    cs.LG cs.NE stat.ML

    Anomaly Detection using One-Class Neural Networks

    Authors: Raghavendra Chalapathy, Aditya Krishna Menon, Sanjay Chawla

    Abstract: We propose a one-class neural network (OC-NN) model to detect anomalies in complex data sets. OC-NN combines the ability of deep networks to extract a progressively rich representation of data with the one-class objective of creating a tight envelope around normal data. The OC-NN approach breaks new ground for the following crucial reason: data representation in the hidden layer is driven by the O… ▽ More

    Submitted 10 January, 2019; v1 submitted 18 February, 2018; originally announced February 2018.

  29. arXiv:1707.04385  [pdf, other

    cs.LG stat.ML

    f-GANs in an Information Geometric Nutshell

    Authors: Richard Nock, Zac Cranko, Aditya Krishna Menon, Lizhen Qu, Robert C. Williamson

    Abstract: Nowozin \textit{et al} showed last year how to extend the GAN \textit{principle} to all $f$-divergences. The approach is elegant but falls short of a full description of the supervised game, and says little about the key player, the generator: for example, what does the generator actually converge to if solving the GAN game means convergence in some space of parameters? How does that provide hints… ▽ More

    Submitted 14 July, 2017; originally announced July 2017.

    ACM Class: I.2.6; I.5.1

  30. arXiv:1704.06743  [pdf, other

    cs.LG cs.CV stat.ML

    Robust, Deep and Inductive Anomaly Detection

    Authors: Raghavendra Chalapathy, Aditya Krishna Menon, Sanjay Chawla

    Abstract: PCA is a classical statistical technique whose simplicity and maturity has seen it find widespread use as an anomaly detection technique. However, it is limited in this regard by being sensitive to gross perturbations of the input, and by seeking a linear subspace that captures normal behaviour. The first issue has been dealt with by robust PCA, a variant of PCA that explicitly allows for some dat… ▽ More

    Submitted 30 July, 2017; v1 submitted 22 April, 2017; originally announced April 2017.

    Comments: Accepted ECML PKDD 2017 Skopje, Macedonia 18-22 September the European Conference On Machine Learning & Principles and Practice of Knowledge Discovery

  31. arXiv:1609.03683  [pdf, other

    stat.ML cs.LG

    Making Deep Neural Networks Robust to Label Noise: a Loss Correction Approach

    Authors: Giorgio Patrini, Alessandro Rozza, Aditya Menon, Richard Nock, Lizhen Qu

    Abstract: We present a theoretically grounded approach to train deep neural networks, including recurrent networks, subject to class-dependent label noise. We propose two procedures for loss correction that are agnostic to both application domain and network architecture. They simply amount to at most a matrix inversion and multiplication, provided that we know the probability of each class being corrupted… ▽ More

    Submitted 22 March, 2017; v1 submitted 13 September, 2016; originally announced September 2016.

    Comments: Oral paper at CVPR 2017

  32. arXiv:1607.00360  [pdf, other

    cs.LG stat.ML

    A scaled Bregman theorem with applications

    Authors: Richard Nock, Aditya Krishna Menon, Cheng Soon Ong

    Abstract: Bregman divergences play a central role in the design and analysis of a range of machine learning algorithms. This paper explores the use of Bregman divergences to establish reductions between such algorithms and their analyses. We present a new scaled isodistortion theorem involving Bregman divergences (scaled Bregman theorem for short) which shows that certain "Bregman distortions'" (employing a… ▽ More

    Submitted 1 July, 2016; originally announced July 2016.

  33. arXiv:1506.01520  [pdf, other

    stat.ML cs.LG

    An Average Classification Algorithm

    Authors: Brendan van Rooyen, Aditya Krishna Menon, Robert C. Williamson

    Abstract: Many classification algorithms produce a classifier that is a weighted average of kernel evaluations. When working with a high or infinite dimensional kernel, it is imperative for speed of evaluation and storage issues that as few training samples as possible are used in the kernel expansion. Popular existing approaches focus on altering standard learning algorithms, such as the Support Vector Mac… ▽ More

    Submitted 15 December, 2015; v1 submitted 4 June, 2015; originally announced June 2015.

  34. arXiv:1206.4661  [pdf

    cs.LG stat.ML

    Predicting accurate probabilities with a ranking loss

    Authors: Aditya Menon, Xiaoqian Jiang, Shankar Vembu, Charles Elkan, Lucila Ohno-Machado

    Abstract: In many real-world applications of machine learning classifiers, it is essential to predict the probability of an example belonging to a particular class. This paper proposes a simple technique for predicting probabilities based on optimizing a ranking loss, followed by isotonic regression. This semi-parametric technique offers both good ranking and regression performance, and models a richer set… ▽ More

    Submitted 18 June, 2012; originally announced June 2012.

    Comments: ICML2012