Skip to main content

Showing 1–11 of 11 results for author: Chidambaram, M

.
  1. arXiv:2501.14249  [pdf, other

    cs.LG cs.AI cs.CL

    Humanity's Last Exam

    Authors: Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, Sean Shi, Michael Choi, Anish Agrawal, Arnav Chopra, Adam Khoja, Ryan Kim, Richard Ren, Jason Hausenloy, Oliver Zhang, Mantas Mazeika, Dmitry Dodonov, Tung Nguyen, Jaeho Lee, Daron Anderson, Mikhail Doroshenko, Alun Cennyth Stokes , et al. (1084 additional authors not shown)

    Abstract: Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of… ▽ More

    Submitted 19 April, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

    Comments: 29 pages, 6 figures

  2. arXiv:2409.13074  [pdf, other

    cs.LG cs.CV stat.ML

    What does guidance do? A fine-grained analysis in a simple setting

    Authors: Muthu Chidambaram, Khashayar Gatmiry, Sitan Chen, Holden Lee, Jianfeng Lu

    Abstract: The use of guidance in diffusion models was originally motivated by the premise that the guidance-modified score is that of the data distribution tilted by a conditional likelihood raised to some power. In this work we clarify this misconception by rigorously proving that guidance fails to sample from the intended tilted distribution. Our main result is to give a fine-grained characterization of… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

  3. arXiv:2406.04068  [pdf, other

    cs.LG math.ST stat.ML

    Reassessing How to Compare and Improve the Calibration of Machine Learning Models

    Authors: Muthu Chidambaram, Rong Ge

    Abstract: A machine learning model is calibrated if its predicted probability for an outcome matches the observed frequency for that outcome conditional on the model prediction. This property has become increasingly important as the impact of machine learning models has continued to spread to various domains. As a result, there are now a dizzying number of recent papers on measuring and improving the calibr… ▽ More

    Submitted 23 February, 2025; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: ICLR 2025, 29 pages, 14 figures

  4. arXiv:2402.10046  [pdf, other

    cs.LG math.PR

    How Flawed Is ECE? An Analysis via Logit Smoothing

    Authors: Muthu Chidambaram, Holden Lee, Colin McSwiggen, Semon Rezchikov

    Abstract: Informally, a model is calibrated if its predictions are correct with a probability that matches the confidence of the prediction. By far the most common method in the literature for measuring calibration is the expected calibration error (ECE). Recent work, however, has pointed out drawbacks of ECE, such as the fact that it is discontinuous in the space of predictors. In this work, we ask: how fu… ▽ More

    Submitted 3 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: 23 pages, 6 figures

    MSC Class: 68T37 (Primary) 62-08; 60E05 (Secondary)

  5. arXiv:2402.06855  [pdf, other

    cs.LG cs.CV

    For Better or For Worse? Learning Minimum Variance Features With Label Augmentation

    Authors: Muthu Chidambaram, Rong Ge

    Abstract: Data augmentation has been pivotal in successfully training deep learning models on classification tasks over the past decade. An important subclass of data augmentation techniques - which includes both label smoothing and Mixup - involves modifying not only the input data but also the input label during model training. In this work, we analyze the role played by the label augmentation aspect of s… ▽ More

    Submitted 12 February, 2025; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: ICLR 2025, 25 pages, 8 figures

  6. arXiv:2306.00740  [pdf, other

    cs.LG stat.ML

    On the Limitations of Temperature Scaling for Distributions with Overlaps

    Authors: Muthu Chidambaram, Rong Ge

    Abstract: Despite the impressive generalization capabilities of deep neural networks, they have been repeatedly shown to be overconfident when they are wrong. Fixing this issue is known as model calibration, and has consequently received much attention in the form of modified training schemes and post-training calibration procedures such as temperature scaling. While temperature scaling is frequently used b… ▽ More

    Submitted 13 February, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: 27 pages, 9 Figures, published in ICLR 2024

  7. arXiv:2302.12715  [pdf, other

    cs.LG cs.AI

    Hiding Data Helps: On the Benefits of Masking for Sparse Coding

    Authors: Muthu Chidambaram, Chenwei Wu, Yu Cheng, Rong Ge

    Abstract: Sparse coding, which refers to modeling a signal as sparse linear combinations of the elements of a learned dictionary, has proven to be a successful (and interpretable) approach in applications such as signal processing, computer vision, and medical imaging. While this success has spurred much work on provable guarantees for dictionary recovery when the learned dictionary is the same size as the… ▽ More

    Submitted 1 June, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

    Comments: 16 pages, 1 figure, ICML 2023

  8. arXiv:2210.13512  [pdf, other

    cs.LG cs.AI cs.CV math.OC stat.ML

    Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup

    Authors: Muthu Chidambaram, Xiang Wang, Chenwei Wu, Rong Ge

    Abstract: Mixup is a data augmentation technique that relies on training using random convex combinations of data points and their labels. In recent years, Mixup has become a standard primitive used in the training of state-of-the-art image classification models due to its demonstrated benefits over empirical risk minimization with regards to generalization and robustness. In this work, we try to explain so… ▽ More

    Submitted 4 November, 2024; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: 37 pages, 2 figures, ICML 2023, minor corrections in latest version

  9. arXiv:2110.07647  [pdf, other

    cs.LG cs.AI

    Towards Understanding the Data Dependency of Mixup-style Training

    Authors: Muthu Chidambaram, Xiang Wang, Yuzheng Hu, Chenwei Wu, Rong Ge

    Abstract: In the Mixup training paradigm, a model is trained using convex combinations of data points and their associated labels. Despite seeing very few true data points during training, models trained using Mixup seem to still minimize the original empirical risk and exhibit better generalization and robustness on various tasks when compared to standard training. In this paper, we investigate how these b… ▽ More

    Submitted 19 February, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: 26 pages, 14 figures, Accepted to ICLR 2022 (Spotlight)

  10. arXiv:1810.12836  [pdf, other

    cs.CL

    Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model

    Authors: Muthuraman Chidambaram, Yinfei Yang, Daniel Cer, Steve Yuan, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil

    Abstract: A significant roadblock in multilingual neural language modeling is the lack of labeled non-English data. One potential method for overcoming this issue is learning cross-lingual text representations that can be used to transfer the performance from training on English tasks to non-English tasks, despite little to no task-specific non-English data. In this paper, we explore a natural setup for lea… ▽ More

    Submitted 1 August, 2019; v1 submitted 30 October, 2018; originally announced October 2018.

    Comments: Accepted at the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

    Journal ref: In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

  11. arXiv:1702.06762  [pdf, other

    cs.LG

    Style Transfer Generative Adversarial Networks: Learning to Play Chess Differently

    Authors: Muthuraman Chidambaram, Yanjun Qi

    Abstract: The idea of style transfer has largely only been explored in image-based tasks, which we attribute in part to the specific nature of loss functions used for style transfer. We propose a general formulation of style transfer as an extension of generative adversarial networks, by using a discriminator to regularize a generator with an otherwise separate loss function. We apply our approach to the ta… ▽ More

    Submitted 7 May, 2017; v1 submitted 22 February, 2017; originally announced February 2017.

    Comments: style transfer, Generative Adversarial Networks