Skip to main content

Showing 1–50 of 73 results for author: Menon, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.11284  [pdf, ps, other

    cs.LG cs.AI cs.IR stat.ML

    Bipartite Ranking From Multiple Labels: On Loss Versus Label Aggregation

    Authors: Michal Lukasik, Lin Chen, Harikrishna Narasimhan, Aditya Krishna Menon, Wittawat Jitkrittum, Felix X. Yu, Sashank J. Reddi, Gang Fu, Mohammadhossein Bateni, Sanjiv Kumar

    Abstract: Bipartite ranking is a fundamental supervised learning problem, with the goal of learning a ranking over instances with maximal Area Under the ROC Curve (AUC) against a single binary target label. However, one may often observe multiple binary target labels, e.g., from distinct human annotators. How can one synthesize such labels into a single coherent ranking? In this work, we formally analyze tw… ▽ More

    Submitted 9 June, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

    Comments: Accepted by ICML 2025

  2. arXiv:2502.08773  [pdf, other

    cs.CL cs.LG

    Universal Model Routing for Efficient LLM Inference

    Authors: Wittawat Jitkrittum, Harikrishna Narasimhan, Ankit Singh Rawat, Jeevesh Juneja, Zifeng Wang, Chen-Yu Lee, Pradeep Shenoy, Rina Panigrahy, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: Large language models' significant advances in capabilities are accompanied by significant increases in inference costs. Model routing is a simple technique for reducing inference cost, wherein one maintains a pool of candidate LLMs, and learns to route each prompt to the smallest feasible LLM. Existing works focus on learning a router for a fixed pool of LLMs. In this paper, we consider the probl… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  3. arXiv:2410.18779  [pdf, other

    cs.LG cs.CL

    A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs

    Authors: Ankit Singh Rawat, Veeranjaneyulu Sadhanala, Afshin Rostamizadeh, Ayan Chakrabarti, Wittawat Jitkrittum, Vladimir Feinberg, Seungyeon Kim, Hrayr Harutyunyan, Nikunj Saunshi, Zachary Nado, Rakesh Shivanna, Sashank J. Reddi, Aditya Krishna Menon, Rohan Anil, Sanjiv Kumar

    Abstract: A primary challenge in large language model (LLM) development is their onerous pre-training cost. Typically, such pre-training involves optimizing a self-supervised objective (such as next-token prediction) over a large corpus. This paper explores a promising paradigm to improve LLM pre-training efficiency and quality by suitably leveraging a small language model (SLM). In particular, this paradig… ▽ More

    Submitted 24 October, 2024; originally announced October 2024.

  4. arXiv:2409.02247  [pdf, other

    physics.flu-dyn cs.CE math.ST physics.comp-ph physics.med-ph

    Personalized and uncertainty-aware coronary hemodynamics simulations: From Bayesian estimation to improved multi-fidelity uncertainty quantification

    Authors: Karthik Menon, Andrea Zanoni, Owais Khan, Gianluca Geraci, Koen Nieman, Daniele E. Schiavazzi, Alison L. Marsden

    Abstract: Simulations of coronary hemodynamics have improved non-invasive clinical risk stratification and treatment outcomes for coronary artery disease, compared to relying on anatomical imaging alone. However, simulations typically use empirical approaches to distribute total coronary flow amongst the arteries in the coronary tree. This ignores patient variability, the presence of disease, and other clin… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  5. arXiv:2406.17968  [pdf, other

    cs.IR cs.AI cs.LG stat.ML

    Efficient Document Ranking with Learnable Late Interactions

    Authors: Ziwei Ji, Himanshu Jain, Andreas Veit, Sashank J. Reddi, Sadeep Jayasumana, Ankit Singh Rawat, Aditya Krishna Menon, Felix Yu, Sanjiv Kumar

    Abstract: Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval. To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings; usually, the former has higher quality while the latter benefits from lower latency. Recently, late-interaction models have been p… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  6. arXiv:2406.00060  [pdf, other

    cs.CL cs.LG

    Cascade-Aware Training of Language Models

    Authors: Congchao Wang, Sean Augenstein, Keith Rush, Wittawat Jitkrittum, Harikrishna Narasimhan, Ankit Singh Rawat, Aditya Krishna Menon, Alec Go

    Abstract: Reducing serving cost and latency is a fundamental concern for the deployment of language models (LMs) in business applications. To address this, cascades of LMs offer an effective solution that conditionally employ smaller models for simpler queries. Cascaded systems are typically built with independently trained models, neglecting the advantages of considering inference-time interactions of the… ▽ More

    Submitted 29 May, 2024; originally announced June 2024.

    Comments: 22 pages, 13 figures

  7. arXiv:2405.19261  [pdf, other

    cs.CL cs.AI cs.LG

    Faster Cascades via Speculative Decoding

    Authors: Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Seungyeon Kim, Neha Gupta, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: Cascades and speculative decoding are two common approaches to improving language models' inference efficiency. Both approaches involve interleaving models of different sizes, but via fundamentally distinct mechanisms: cascades employ a deferral rule that invokes the larger model only for "hard" inputs, while speculative decoding uses speculative execution to primarily invoke the larger model in p… ▽ More

    Submitted 21 October, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  8. arXiv:2404.14187  [pdf, other

    cs.CE

    Bayesian Windkessel calibration using optimized 0D surrogate models

    Authors: Jakob Richter, Jonas Nitzler, Luca Pegolotti, Karthik Menon, Jonas Biehler, Wolfgang A. Wall, Daniele E. Schiavazzi, Alison L. Marsden, Martin R. Pfaller

    Abstract: Boundary condition (BC) calibration to assimilate clinical measurements is an essential step in any subject-specific simulation of cardiovascular fluid dynamics. Bayesian calibration approaches have successfully quantified the uncertainties inherent in identified parameters. Yet, routinely estimating the posterior distribution for all BC parameters in 3D simulations has been unattainable due to th… ▽ More

    Submitted 29 July, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  9. arXiv:2404.10136  [pdf, other

    cs.CL cs.AI cs.LG

    Language Model Cascades: Token-level uncertainty and beyond

    Authors: Neha Gupta, Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: Recent advances in language models (LMs) have led to significant improvements in quality on complex NLP tasks, but at the expense of increased inference costs. Cascading offers a simple strategy to achieve more favorable cost-quality tradeoffs: here, a small model is invoked for most "easy" instances, while a few "hard" instances are deferred to the large model. While the principles underpinning c… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  10. arXiv:2403.04182  [pdf, other

    cs.CL cs.AI

    Regression-aware Inference with LLMs

    Authors: Michal Lukasik, Harikrishna Narasimhan, Aditya Krishna Menon, Felix Yu, Sanjiv Kumar

    Abstract: Large language models (LLMs) have shown strong results on a range of applications, including regression and scoring tasks. Typically, one obtains outputs from an LLM via autoregressive sampling from the model's output distribution. We show that this inference strategy can be sub-optimal for common regression and scoring evaluation metrics. As a remedy, we build on prior work on Minimum Bayes Risk… ▽ More

    Submitted 1 November, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

    Comments: EMNLP Findings 2024

    Journal ref: EMNLP Findings 2024

  11. arXiv:2402.00090  [pdf

    q-bio.NC cs.HC

    Classification of attention performance post-longitudinal tDCS via functional connectivity and machine learning methods

    Authors: Akash K Rao, Vishnu K Menon, Arnav Bhavsar, Shubhajit Roy Chowdhury, Ramsingh Negi, Varun Dutt

    Abstract: Attention is the brain's mechanism for selectively processing specific stimuli while filtering out irrelevant information. Characterizing changes in attention following long-term interventions (such as transcranial direct current stimulation (tDCS)) has seldom been emphasized in the literature. To classify attention performance post-tDCS, this study uses functional connectivity and machine learnin… ▽ More

    Submitted 31 January, 2024; originally announced February 2024.

    Comments: 6 pages, to be presented in the IEEE 9th International Conference for Convergence in Technology (I2CT),Pune, April 2024. arXiv admin note: substantial text overlap with arXiv:2401.17700

  12. arXiv:2401.17745  [pdf

    cs.RO

    Gesture Controlled Robot For Human Detection

    Authors: Athira T. S, Honey Manoj, R S Vishnu Priya, Vishnu K Menon, Srilekshmi M

    Abstract: It is very important to locate survivors from collapsed buildings so that rescue operations can be arranged. Many lives are lost due to lack of competent systems to detect people in these collapsed buildings at the right time. So here we have designed a hand gesture controlled robot which is capable of detecting humans under these collapsed building parts. The proposed work can be used to access s… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 6 pages, presented at the 2nd International Conference on IoT Based Control Networks and Intelligent Systems(ICICNIS 2021)

    Journal ref: proceedings of International Conference on IoT Based Control Networks & Intelligent Systems - ICICNIS 2021, 6 pages,2021

  13. arXiv:2401.17711  [pdf

    cs.HC cs.AI

    Prediction of multitasking performance post-longitudinal tDCS via EEG-based functional connectivity and machine learning methods

    Authors: Akash K Rao, Shashank Uttrani, Vishnu K Menon, Darshil Shah, Arnav Bhavsar, Shubhajit Roy Chowdhury, Varun Dutt

    Abstract: Predicting and understanding the changes in cognitive performance, especially after a longitudinal intervention, is a fundamental goal in neuroscience. Longitudinal brain stimulation-based interventions like transcranial direct current stimulation (tDCS) induce short-term changes in the resting membrane potential and influence cognitive processes. However, very little research has been conducted o… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 16 pages, presented at the 30th International Conference on Neural Information Processing (ICONIP2023), Changsha, China, November 2023

  14. arXiv:2401.17705  [pdf

    cs.LG cs.HC

    Predicting suicidal behavior among Indian adults using childhood trauma, mental health questionnaires and machine learning cascade ensembles

    Authors: Akash K Rao, Gunjan Y Trivedi, Riri G Trivedi, Anshika Bajpai, Gajraj Singh Chauhan, Vishnu K Menon, Kathirvel Soundappan, Hemalatha Ramani, Neha Pandya, Varun Dutt

    Abstract: Among young adults, suicide is India's leading cause of death, accounting for an alarming national suicide rate of around 16%. In recent years, machine learning algorithms have emerged to predict suicidal behavior using various behavioral traits. But to date, the efficacy of machine learning algorithms in predicting suicidal behavior in the Indian context has not been explored in literature. In th… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 11 pages, presnted at the 4th International Conference on Frontiers in Computing and Systems (COMSYS 2023), Himachal Pradesh, October 2023

  15. arXiv:2401.17700  [pdf

    cs.HC cs.AI

    Classification of executive functioning performance post-longitudinal tDCS using functional connectivity and machine learning methods

    Authors: Akash K Rao, Vishnu K Menon, Shashank Uttrani, Ayushman Dixit, Dipanshu Verma, Varun Dutt

    Abstract: Executive functioning is a cognitive process that enables humans to plan, organize, and regulate their behavior in a goal-directed manner. Understanding and classifying the changes in executive functioning after longitudinal interventions (like transcranial direct current stimulation (tDCS)) has not been explored in the literature. This study employs functional connectivity and machine learning al… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 7 pages, presented at the IEEE 20th India Council International Conference (INDICON 2023), Hyderabad, India, December 2023

  16. arXiv:2312.00854  [pdf, other

    physics.med-ph cs.AI cs.LG math.NA stat.CO

    A Probabilistic Neural Twin for Treatment Planning in Peripheral Pulmonary Artery Stenosis

    Authors: John D. Lee, Jakob Richter, Martin R. Pfaller, Jason M. Szafron, Karthik Menon, Andrea Zanoni, Michael R. Ma, Jeffrey A. Feinstein, Jacqueline Kreutzer, Alison L. Marsden, Daniele E. Schiavazzi

    Abstract: The substantial computational cost of high-fidelity models in numerical hemodynamics has, so far, relegated their use mainly to offline treatment planning. New breakthroughs in data-driven architectures and optimization techniques for fast surrogate modeling provide an exciting opportunity to overcome these limitations, enabling the use of such technology for time-critical decisions. We discuss an… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  17. arXiv:2310.08461  [pdf, other

    cs.CL cs.AI cs.LG

    DistillSpec: Improving Speculative Decoding via Knowledge Distillation

    Authors: Yongchao Zhou, Kaifeng Lyu, Ankit Singh Rawat, Aditya Krishna Menon, Afshin Rostamizadeh, Sanjiv Kumar, Jean-François Kagy, Rishabh Agarwal

    Abstract: Speculative decoding (SD) accelerates large language model inference by employing a faster draft model for generating multiple tokens, which are then verified in parallel by the larger target model, resulting in the text generated according to the target model distribution. However, identifying a compact draft model that is well-aligned with the target model is challenging. To tackle this issue, w… ▽ More

    Submitted 30 March, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

  18. arXiv:2310.05337  [pdf, other

    cs.LG cs.CV

    What do larger image classifiers memorise?

    Authors: Michal Lukasik, Vaishnavh Nagarajan, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: The success of modern neural networks has prompted study of the connection between memorisation and generalisation: overparameterised models generalise well, despite being able to perfectly fit (memorise) completely random labels. To carefully study this issue, Feldman proposed a metric to quantify the degree of memorisation of individual training examples, and empirically computed the correspondi… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    MSC Class: Machine Learning (cs.LG); Artificial Intelligence (cs.AI) Machine Learning (stat.ML)

  19. arXiv:2310.02226  [pdf, other

    cs.CL cs.AI cs.LG

    Think before you speak: Training Language Models With Pause Tokens

    Authors: Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan

    Abstract: Language models generate responses by producing a series of tokens in immediate succession: the $(K+1)^{th}$ token is an outcome of manipulating $K$ hidden vectors per layer, one vector per preceding token. What if instead we were to let the model manipulate say, $K+10$ hidden vectors, before it outputs the $(K+1)^{th}$ token? We operationalize this idea by performing training and inference on lan… ▽ More

    Submitted 20 April, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: Published at ICLR 2024

  20. arXiv:2307.11106  [pdf, other

    cs.LG cs.CR cs.IT

    The importance of feature preprocessing for differentially private linear optimization

    Authors: Ziteng Sun, Ananda Theertha Suresh, Aditya Krishna Menon

    Abstract: Training machine learning models with differential privacy (DP) has received increasing interest in recent years. One of the most popular algorithms for training differentially private models is differentially private stochastic gradient descent (DPSGD) and its variants, where at each step gradients are clipped and combined with some noise. Given the increasing usage of DPSGD, we ask the question:… ▽ More

    Submitted 19 February, 2024; v1 submitted 19 July, 2023; originally announced July 2023.

  21. arXiv:2307.02764  [pdf, other

    cs.LG stat.ML

    When Does Confidence-Based Cascade Deferral Suffice?

    Authors: Wittawat Jitkrittum, Neha Gupta, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sanjiv Kumar

    Abstract: Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. One simple deferral rule employs the confidence of the current classifier, e.g., based on the maximum predicted softmax probability. Despite… ▽ More

    Submitted 23 January, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023

  22. arXiv:2303.04630  [pdf

    cs.LG q-bio.QM stat.AP

    Mining the contribution of intensive care clinical course to outcome after traumatic brain injury

    Authors: Shubhayu Bhattacharyay, Pier Francesco Caruso, Cecilia Åkerlund, Lindsay Wilson, Robert D Stevens, David K Menon, Ewout W Steyerberg, David W Nelson, Ari Ercole, the CENTER-TBI investigators/participants

    Abstract: Existing methods to characterise the evolving condition of traumatic brain injury (TBI) patients in the intensive care unit (ICU) do not capture the context necessary for individualising treatment. Here, we integrate all heterogenous data stored in medical records (1,166 pre-ICU and ICU variables) to model the individualised contribution of clinical course to six-month functional outcome on the Gl… ▽ More

    Submitted 1 August, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

    Journal ref: npj Digit. Med. 6, 154 (2023)

  23. arXiv:2302.01576  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    ResMem: Learn what you can and memorize the rest

    Authors: Zitong Yang, Michal Lukasik, Vaishnavh Nagarajan, Zonglin Li, Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: The impressive generalization performance of modern neural networks is attributed in part to their ability to implicitly memorize complex training patterns. Inspired by this, we explore a novel mechanism to improve model generalization via explicit memorization. Specifically, we propose the residual-memorization (ResMem) algorithm, a new method that augments an existing prediction model (e.g. a ne… ▽ More

    Submitted 20 October, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

  24. arXiv:2301.12923  [pdf, other

    cs.LG cs.AI stat.ML

    On student-teacher deviations in distillation: does it pay to disobey?

    Authors: Vaishnavh Nagarajan, Aditya Krishna Menon, Srinadh Bhojanapalli, Hossein Mobahi, Sanjiv Kumar

    Abstract: Knowledge distillation (KD) has been widely used to improve the test accuracy of a "student" network, by training it to mimic the soft probabilities of a trained "teacher" network. Yet, it has been shown in recent work that, despite being trained to fit the teacher's probabilities, the student may not only significantly deviate from the teacher probabilities, but may also outdo than the teacher in… ▽ More

    Submitted 18 March, 2024; v1 submitted 30 January, 2023; originally announced January 2023.

  25. arXiv:2301.12386  [pdf, other

    cs.LG

    Plugin estimators for selective classification with out-of-distribution detection

    Authors: Harikrishna Narasimhan, Aditya Krishna Menon, Wittawat Jitkrittum, Sanjiv Kumar

    Abstract: Real-world classifiers can benefit from the option of abstaining from predicting on samples where they have low confidence. Such abstention is particularly useful on samples which are close to the learned decision boundary, or which are outliers with respect to the training sample. These settings have been the subject of extensive but disjoint study in the selective classification (SC) and out-of-… ▽ More

    Submitted 24 July, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

  26. arXiv:2301.12245  [pdf, other

    cs.LG

    Supervision Complexity and its Role in Knowledge Distillation

    Authors: Hrayr Harutyunyan, Ankit Singh Rawat, Aditya Krishna Menon, Seungyeon Kim, Sanjiv Kumar

    Abstract: Despite the popularity and efficacy of knowledge distillation, there is limited understanding of why it helps. In order to study the generalization behavior of a distilled student, we propose a new theoretical framework that leverages supervision complexity: a measure of alignment between teacher-provided supervision and the student's neural tangent kernel. The framework highlights a delicate inte… ▽ More

    Submitted 28 January, 2023; originally announced January 2023.

    Comments: Published at ICLR 2023

  27. arXiv:2301.12005  [pdf, other

    cs.LG

    EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval

    Authors: Seungyeon Kim, Ankit Singh Rawat, Manzil Zaheer, Sadeep Jayasumana, Veeranjaneyulu Sadhanala, Wittawat Jitkrittum, Aditya Krishna Menon, Rob Fergus, Sanjiv Kumar

    Abstract: Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR). In this paper, we aim to improve distillation methods that pave the way for the resource-efficient deployment of such models in practice. Inspired by our theoretical analysis of the teacher-student generalization gap for IR models, we propose a novel distillation approach that leverages… ▽ More

    Submitted 3 July, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

  28. arXiv:2210.16413  [pdf, other

    cs.LG

    When does mixup promote local linearity in learned representations?

    Authors: Arslan Chaudhry, Aditya Krishna Menon, Andreas Veit, Sadeep Jayasumana, Srikumar Ramalingam, Sanjiv Kumar

    Abstract: Mixup is a regularization technique that artificially produces new samples using convex combinations of original training points. This simple technique has shown strong empirical performance, and has been heavily used as part of semi-supervised learning techniques such as mixmatch~\citep{berthelot2019mixmatch} and interpolation consistent training (ICT)~\citep{verma2019interpolation}. In this pape… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Journal ref: NeurIPS 2022 (First Workshop on Interpolation and Beyond)

  29. arXiv:2206.06479  [pdf, other

    cs.LG

    Robust Distillation for Worst-class Performance

    Authors: Serena Wang, Harikrishna Narasimhan, Yichen Zhou, Sara Hooker, Michal Lukasik, Aditya Krishna Menon

    Abstract: Knowledge distillation has proven to be an effective technique in improving the performance a student model using predictions from a teacher model. However, recent work has shown that gains in average efficiency are not uniform across subgroups in the data, and in particular can often come at the cost of accuracy on rare subgroups and classes. To preserve strong performance across classes that may… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

  30. arXiv:2204.13208  [pdf, other

    cs.LG stat.ML

    ELM: Embedding and Logit Margins for Long-Tail Learning

    Authors: Wittawat Jitkrittum, Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar

    Abstract: Long-tail learning is the problem of learning under skewed label distributions, which pose a challenge for standard learners. Several recent approaches for the problem have proposed enforcing a suitable margin in logit space. Such techniques are intuitive analogues of the guiding principle behind SVMs, and are equally applicable to linear models and neural models. However, when applied to neural m… ▽ More

    Submitted 27 April, 2022; originally announced April 2022.

    Comments: 24 pages

  31. The leap to ordinal: detailed functional prognosis after traumatic brain injury with a flexible modelling approach

    Authors: Shubhayu Bhattacharyay, Ioan Milosevic, Lindsay Wilson, David K. Menon, Robert D. Stevens, Ewout W. Steyerberg, David W. Nelson, Ari Ercole, the CENTER-TBI investigators/participants

    Abstract: When a patient is admitted to the intensive care unit (ICU) after a traumatic brain injury (TBI), an early prognosis is essential for baseline risk adjustment and shared decision making. TBI outcomes are commonly categorised by the Glasgow Outcome Scale-Extended (GOSE) into 8, ordered levels of functional recovery at 6 months after injury. Existing ICU prognostic models predict binary outcomes at… ▽ More

    Submitted 4 May, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

    Comments: 68 pages, 4 figures, 4 tables, 1 appendix, 6 supplementary figures, 4 supplementary tables, 3 supplementary methods, 1 supplementary result

    ACM Class: J.3; I.2.0; I.5.1

    Journal ref: PLOS ONE 17:7 (2022) e0270973

  32. arXiv:2110.10305  [pdf, other

    cs.LG

    When in Doubt, Summon the Titans: Efficient Inference with Large Models

    Authors: Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Amr Ahmed, Sanjiv Kumar

    Abstract: Scaling neural networks to "large" sizes, with billions of parameters, has been shown to yield impressive results on many challenging problems. However, the inference cost incurred by such large models often prevents their application in most real-world settings. In this paper, we propose a two-stage framework based on distillation that realizes the modelling benefits of the large models, while la… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

  33. arXiv:2107.08964  [pdf, other

    cs.CV

    Transductive image segmentation: Self-training and effect of uncertainty estimation

    Authors: Konstantinos Kamnitsas, Stefan Winzeck, Evgenios N. Kornaropoulos, Daniel Whitehouse, Cameron Englman, Poe Phyu, Norman Pao, David K. Menon, Daniel Rueckert, Tilak Das, Virginia F. J. Newcombe, Ben Glocker

    Abstract: Semi-supervised learning (SSL) uses unlabeled data during training to learn better models. Previous studies on SSL for medical image segmentation focused mostly on improving model generalization to unseen data. In some applications, however, our primary interest is not generalization but to obtain optimal predictions on a specific unlabeled database that is fully available during model development… ▽ More

    Submitted 2 August, 2021; v1 submitted 19 July, 2021; originally announced July 2021.

    Comments: Published at Domain Adaptation and Representation Transfer (DART) wshop at MICCAI 2021. This version improves methods' names and adds 1 experiment in Tab.3a

  34. arXiv:2107.04641  [pdf, other

    cs.LG cs.AI

    Training Over-parameterized Models with Non-decomposable Objectives

    Authors: Harikrishna Narasimhan, Aditya Krishna Menon

    Abstract: Many modern machine learning applications come with complex and nuanced design goals such as minimizing the worst-case error, satisfying a given precision or recall target, or enforcing group-fairness constraints. Popular techniques for optimizing such non-decomposable objectives reduce the problem into a sequence of cost-sensitive learning tasks, each of which is then solved by re-weighting the t… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

  35. arXiv:2106.10494  [pdf, other

    cs.LG

    Teacher's pet: understanding and mitigating biases in distillation

    Authors: Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: Knowledge distillation is widely used as a means of improving the performance of a relatively simple student model using the predictions from a complex teacher model. Several works have shown that distillation significantly boosts the student's overall performance; however, are these gains uniform across all data subgroups? In this paper, we show that distillation can harm performance on certain s… ▽ More

    Submitted 8 July, 2021; v1 submitted 19 June, 2021; originally announced June 2021.

    Comments: 21 pages, 8 figures

  36. arXiv:2105.05736  [pdf, other

    cs.LG stat.ML

    Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces

    Authors: Ankit Singh Rawat, Aditya Krishna Menon, Wittawat Jitkrittum, Sadeep Jayasumana, Felix X. Yu, Sashank Reddi, Sanjiv Kumar

    Abstract: Negative sampling schemes enable efficient training given a large number of classes, by offering a means to approximate a computationally expensive loss function that takes all labels into account. In this paper, we present a new connection between these schemes and loss modification techniques for countering label imbalance. We show that different negative sampling schemes implicitly trade-off pe… ▽ More

    Submitted 12 May, 2021; originally announced May 2021.

    Comments: To appear in ICML 2021

  37. arXiv:2104.07932  [pdf, other

    cs.LG cs.CE stat.ML

    Interval-censored Hawkes processes

    Authors: Marian-Andrei Rizoiu, Alexander Soen, Shidi Li, Pio Calderon, Leanne Dong, Aditya Krishna Menon, Lexing Xie

    Abstract: Interval-censored data solely records the aggregated counts of events during specific time intervals - such as the number of patients admitted to the hospital or the volume of vehicles passing traffic loop detectors - and not the exact occurrence time of the events. It is currently not understood how to fit the Hawkes point processes to this kind of data. Its typical loss function (the point proce… ▽ More

    Submitted 25 November, 2022; v1 submitted 16 April, 2021; originally announced April 2021.

    Journal ref: Journal of Machine Learning Research, 23(338):1-84, 2022. https://jmlr.org/papers/v23/21-0917.html

  38. arXiv:2102.06849  [pdf, other

    cs.LG cs.AI stat.ML

    Distilling Double Descent

    Authors: Andrew Cotter, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sashank J. Reddi, Yichen Zhou

    Abstract: Distillation is the technique of training a "student" model based on examples that are labeled by a separate "teacher" model, which itself is trained on a labeled dataset. The most common explanations for why distillation "works" are predicated on the assumption that student is provided with \emph{soft} labels, \eg probabilities or confidences, from the teacher model. In this work, we show, that,… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

  39. arXiv:2010.07447  [pdf, ps, other

    cs.CL cs.LG

    Semantic Label Smoothing for Sequence to Sequence Problems

    Authors: Michal Lukasik, Himanshu Jain, Aditya Krishna Menon, Seungyeon Kim, Srinadh Bhojanapalli, Felix Yu, Sanjiv Kumar

    Abstract: Label smoothing has been shown to be an effective regularization strategy in classification, that prevents overfitting and helps in label de-noising. However, extending such methods directly to seq2seq settings, such as Machine Translation, is challenging: the large target output space of such problems makes it intractable to apply label smoothing over all possible outputs. Most existing approache… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

  40. arXiv:2010.02568  [pdf, other

    cs.CL

    SupMMD: A Sentence Importance Model for Extractive Summarization using Maximum Mean Discrepancy

    Authors: Umanga Bista, Alexander Patrick Mathews, Aditya Krishna Menon, Lexing Xie

    Abstract: Most work on multi-document summarization has focused on generic summarization of information present in each individual document set. However, the under-explored setting of update summarization, where the goal is to identify the new information present in each set, is of equal practical interest (e.g., presenting readers with updates on an evolving news topic). In this work, we present SupMMD, a… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: 15 pages

    Journal ref: EMNLP 2020

  41. arXiv:2007.07314  [pdf, other

    cs.LG stat.ML

    Long-tail learning via logit adjustment

    Authors: Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar

    Abstract: Real-world classification problems typically exhibit an imbalanced or long-tailed label distribution, wherein many labels are associated with only a few samples. This poses a challenge for generalisation on such labels, and also makes naïve learning biased towards dominant labels. In this paper, we present two simple modifications of standard softmax cross-entropy training to cope with these chall… ▽ More

    Submitted 9 July, 2021; v1 submitted 14 July, 2020; originally announced July 2020.

    Comments: Published as a conference paper in ICLR 2021

  42. arXiv:2005.10419  [pdf, other

    cs.LG stat.ML

    Why distillation helps: a statistical perspective

    Authors: Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Seungyeon Kim, Sanjiv Kumar

    Abstract: Knowledge distillation is a technique for improving the performance of a simple "student" model by replacing its one-hot training labels with a distribution over labels obtained from a complex "teacher" model. While this simple approach has proven widely effective, a basic question remains unresolved: why does distillation help? In this paper, we present a statistical perspective on distillation w… ▽ More

    Submitted 20 May, 2020; originally announced May 2020.

  43. arXiv:2004.11460  [pdf, other

    q-bio.QM cs.CY cs.LG stat.ML

    Development of a Machine Learning Model and Mobile Application to Aid in Predicting Dosage of Vitamin K Antagonists Among Indian Patients

    Authors: Amruthlal M, Devika S, Ameer Suhail P A, Aravind K Menon, Vignesh Krishnan, Alan Thomas, Manu Thomas, Sanjay G, Lakshmi Kanth L R, Jimmy Jose, Harikrishnan S

    Abstract: Patients who undergo mechanical heart valve replacements or have conditions like Atrial Fibrillation have to take Vitamin K Antagonists (VKA) drugs to prevent coagulation of blood. These drugs have narrow therapeutic range and need to be very closely monitored due to life threatening side effects. The dosage of VKA drug is determined and revised by a physician based on Prothrombin Time - Internati… ▽ More

    Submitted 19 April, 2020; originally announced April 2020.

  44. arXiv:2004.10915  [pdf, other

    cs.LG stat.ML

    Doubly-stochastic mining for heterogeneous retrieval

    Authors: Ankit Singh Rawat, Aditya Krishna Menon, Andreas Veit, Felix Yu, Sashank J. Reddi, Sanjiv Kumar

    Abstract: Modern retrieval problems are characterised by training sets with potentially billions of labels, and heterogeneous data distributions across subpopulations (e.g., users of a retrieval system may be from different countries), each of which poses a challenge. The first challenge concerns scalability: with a large number of labels, standard losses are difficult to optimise even on a single example.… ▽ More

    Submitted 22 April, 2020; originally announced April 2020.

  45. arXiv:2004.10342  [pdf, ps, other

    cs.LG stat.ML

    Federated Learning with Only Positive Labels

    Authors: Felix X. Yu, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: We consider learning a multi-class classification model in the federated setting, where each user has access to the positive data associated with only a single class. As a result, during each federated learning round, the users need to locally update the classifier without having access to the features and the model parameters for the negative classes. Thus, naively employing conventional decentra… ▽ More

    Submitted 21 April, 2020; originally announced April 2020.

  46. arXiv:2003.12017  [pdf

    q-bio.PE cs.LG

    Prediction of number of cases expected and estimation of the final size of coronavirus epidemic in India using the logistic model and genetic algorithm

    Authors: Ganesh Kumar M, Soman K. P, Gopalakrishnan E. A, Vijay Krishna Menon, Sowmya V

    Abstract: In this paper, we have applied the logistic growth regression model and genetic algorithm to predict the number of coronavirus infected cases that can be expected in upcoming days in India and also estimated the final size and its peak time of the coronavirus epidemic in India.

    Submitted 26 March, 2020; originally announced March 2020.

  47. arXiv:2003.02819  [pdf, other

    cs.LG stat.ML

    Does label smoothing mitigate label noise?

    Authors: Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: Label smoothing is commonly used in training deep learning models, wherein one-hot training labels are mixed with uniform label vectors. Empirically, smoothing has been shown to improve both predictive performance and model calibration. In this paper, we study whether label smoothing is also effective as a means of coping with label noise. While label smoothing apparently amplifies this problem --… ▽ More

    Submitted 5 March, 2020; originally announced March 2020.

  48. arXiv:2002.03555  [pdf, other

    cs.LG stat.ML

    Supervised Learning: No Loss No Cry

    Authors: Richard Nock, Aditya Krishna Menon

    Abstract: Supervised learning requires the specification of a loss function to minimise. While the theory of admissible losses from both a computational and statistical perspective is well-developed, these offer a panoply of different choices. In practice, this choice is typically made in an \emph{ad hoc} manner. In hopes of making this procedure more principled, the problem of \emph{learning the loss funct… ▽ More

    Submitted 10 February, 2020; originally announced February 2020.

    ACM Class: I.2.6

  49. arXiv:1909.09667  [pdf, other

    cs.LG stat.ML

    Online Hierarchical Clustering Approximations

    Authors: Aditya Krishna Menon, Anand Rajagopalan, Baris Sumengen, Gui Citovsky, Qin Cao, Sanjiv Kumar

    Abstract: Hierarchical clustering is a widely used approach for clustering datasets at multiple levels of granularity. Despite its popularity, existing algorithms such as hierarchical agglomerative clustering (HAC) are limited to the offline setting, and thus require the entire dataset to be available. This prohibits their use on large datasets commonly encountered in modern learning applications. In this p… ▽ More

    Submitted 20 September, 2019; originally announced September 2019.

    Comments: 17 pages, 3 figures

  50. arXiv:1901.10837  [pdf, other

    cs.LG cs.AI cs.CY stat.ML

    Noise-tolerant fair classification

    Authors: Alexandre Louis Lamy, Ziyuan Zhong, Aditya Krishna Menon, Nakul Verma

    Abstract: Fairness-aware learning involves designing algorithms that do not discriminate with respect to some sensitive feature (e.g., race or gender). Existing work on the problem operates under the assumption that the sensitive feature available in one's training sample is perfectly reliable. This assumption may be violated in many real-world cases: for example, respondents to a survey may choose to conce… ▽ More

    Submitted 9 January, 2020; v1 submitted 30 January, 2019; originally announced January 2019.