Skip to main content

Showing 1–6 of 6 results for author: Hooker, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2504.20879  [pdf, other

    cs.AI cs.CL cs.LG stat.ME

    The Leaderboard Illusion

    Authors: Shivalika Singh, Yiyang Nan, Alex Wang, Daniel D'Souza, Sayash Kapoor, Ahmet Üstün, Sanmi Koyejo, Yuntian Deng, Shayne Longpre, Noah A. Smith, Beyza Ermis, Marzieh Fadaee, Sara Hooker

    Abstract: Measuring progress is fundamental to the advancement of any scientific field. As benchmarks play an increasingly central role, they also grow more susceptible to distortion. Chatbot Arena has emerged as the go-to leaderboard for ranking the most capable AI systems. Yet, in this work we identify systematic issues that have resulted in a distorted playing field. We find that undisclosed private test… ▽ More

    Submitted 12 May, 2025; v1 submitted 29 April, 2025; originally announced April 2025.

    Comments: 68 pages, 18 figures, 9 tables

  2. arXiv:2303.00586  [pdf, other

    stat.ML cs.AI cs.CV cs.CY cs.LG

    FAIR-Ensemble: When Fairness Naturally Emerges From Deep Ensembling

    Authors: Wei-Yin Ko, Daniel D'souza, Karina Nguyen, Randall Balestriero, Sara Hooker

    Abstract: Ensembling multiple Deep Neural Networks (DNNs) is a simple and effective way to improve top-line metrics and to outperform a larger single model. In this work, we go beyond top-line metrics and instead explore the impact of ensembling on subgroup performances. Surprisingly, we observe that even with a simple homogeneous ensemble -- all the individual DNNs share the same training set, architecture… ▽ More

    Submitted 20 December, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

  3. arXiv:1911.05248  [pdf, other

    cs.LG cs.AI cs.CV cs.HC stat.ML

    What Do Compressed Deep Neural Networks Forget?

    Authors: Sara Hooker, Aaron Courville, Gregory Clark, Yann Dauphin, Andrea Frome

    Abstract: Deep neural network pruning and quantization techniques have demonstrated it is possible to achieve high levels of compression with surprisingly little degradation to test set accuracy. However, this measure of performance conceals significant differences in how different classes and images are impacted by model compression techniques. We find that models with radically different numbers of weight… ▽ More

    Submitted 5 September, 2021; v1 submitted 12 November, 2019; originally announced November 2019.

  4. arXiv:1902.09574  [pdf, other

    cs.LG stat.ML

    The State of Sparsity in Deep Neural Networks

    Authors: Trevor Gale, Erich Elsen, Sara Hooker

    Abstract: We rigorously evaluate three state-of-the-art techniques for inducing sparsity in deep neural networks on two large-scale learning tasks: Transformer trained on WMT 2014 English-to-German, and ResNet-50 trained on ImageNet. Across thousands of experiments, we demonstrate that complex techniques (Molchanov et al., 2017; Louizos et al., 2017b) shown to yield high compression rates on smaller dataset… ▽ More

    Submitted 25 February, 2019; originally announced February 2019.

  5. arXiv:1806.10758  [pdf, other

    cs.LG cs.AI stat.ML

    A Benchmark for Interpretability Methods in Deep Neural Networks

    Authors: Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, Been Kim

    Abstract: We propose an empirical measure of the approximate accuracy of feature importance estimates in deep neural networks. Our results across several large-scale image classification datasets show that many popular interpretability methods produce estimates of feature importance that are not better than a random designation of feature importance. Only certain ensemble based approaches---VarGrad and Smoo… ▽ More

    Submitted 4 November, 2019; v1 submitted 27 June, 2018; originally announced June 2018.

    Comments: In NeurIPS 2019

  6. arXiv:1711.00867  [pdf, other

    stat.ML cs.LG

    The (Un)reliability of saliency methods

    Authors: Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schütt, Sven Dähne, Dumitru Erhan, Been Kim

    Abstract: Saliency methods aim to explain the predictions of deep neural networks. These methods lack reliability when the explanation is sensitive to factors that do not contribute to the model prediction. We use a simple and common pre-processing step ---adding a constant shift to the input data--- to show that a transformation with no effect on the model can cause numerous methods to incorrectly attribut… ▽ More

    Submitted 2 November, 2017; originally announced November 2017.