Skip to main content

Showing 1–6 of 6 results for author: Choshen, L

Searching in archive stat. Search in all archives.
.
  1. arXiv:2412.06540  [pdf, other

    cs.LG cs.AI stat.ML

    Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families

    Authors: Felipe Maia Polo, Seamus Somerstep, Leshem Choshen, Yuekai Sun, Mikhail Yurochkin

    Abstract: Scaling laws for large language models (LLMs) predict model performance based on parameters like size and training data. However, differences in training configurations and data processing across model families lead to significant variations in benchmark performance, making it difficult for a single scaling law to generalize across all LLMs. On the other hand, training family-specific scaling laws… ▽ More

    Submitted 4 February, 2025; v1 submitted 9 December, 2024; originally announced December 2024.

  2. arXiv:2405.17202  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Efficient multi-prompt evaluation of LLMs

    Authors: Felipe Maia Polo, Ronald Xu, Lucas Weber, Mírian Silva, Onkar Bhardwaj, Leshem Choshen, Allysson Flavio Melo de Oliveira, Yuekai Sun, Mikhail Yurochkin

    Abstract: Most popular benchmarks for comparing LLMs rely on a limited set of prompt templates, which may not fully capture the LLMs' abilities and can affect the reproducibility of results on leaderboards. Many recent works empirically verify prompt sensitivity and advocate for changes in LLM evaluation. In this paper, we consider the problem of estimating the performance distribution across many prompt va… ▽ More

    Submitted 30 October, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: NeurIPS 2024

  3. arXiv:2402.14992  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    tinyBenchmarks: evaluating LLMs with fewer examples

    Authors: Felipe Maia Polo, Lucas Weber, Leshem Choshen, Yuekai Sun, Gongjun Xu, Mikhail Yurochkin

    Abstract: The versatility of large language models (LLMs) led to the creation of diverse benchmarks that thoroughly test a variety of language models' abilities. These benchmarks consist of tens of thousands of examples making evaluation of LLMs very expensive. In this paper, we investigate strategies to reduce the number of evaluations needed to assess the performance of an LLM on several key benchmarks. F… ▽ More

    Submitted 26 May, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: Proceedings of the 41st International Conference on Machine Learning (ICML)

  4. arXiv:1907.08971  [pdf, other

    cs.LG cs.CL stat.ML

    Are You Convinced? Choosing the More Convincing Evidence with a Siamese Network

    Authors: Martin Gleize, Eyal Shnarch, Leshem Choshen, Lena Dankin, Guy Moshkowich, Ranit Aharonov, Noam Slonim

    Abstract: With the advancement in argument detection, we suggest to pay more attention to the challenging task of identifying the more convincing arguments. Machines capable of responding and interacting with humans in helpful ways have become ubiquitous. We now expect them to discuss with us the more delicate questions in our world, and they should do so armed with effective arguments. But what makes an ar… ▽ More

    Submitted 23 July, 2019; v1 submitted 21 July, 2019; originally announced July 2019.

    Comments: accepted to ACL 2019 - long paper

  5. arXiv:1905.10854  [pdf, other

    cs.LG stat.ML

    Let's Agree to Agree: Neural Networks Share Classification Order on Real Datasets

    Authors: Guy Hacohen, Leshem Choshen, Daphna Weinshall

    Abstract: We report a series of robust empirical observations, demonstrating that deep Neural Networks learn the examples in both the training and test sets in a similar order. This phenomenon is observed in all the commonly used benchmarks we evaluated, including many image classification benchmarks, and one text classification benchmark. While this phenomenon is strongest for models of the same architectu… ▽ More

    Submitted 20 July, 2020; v1 submitted 26 May, 2019; originally announced May 2019.

    Comments: Published at ICML 2020

    Journal ref: Proceedings: 37th International Conference on Machine Learning (ICML), Viena Austria, July 2020

  6. arXiv:1804.04012  [pdf, other

    cs.LG cs.AI stat.ML

    DORA The Explorer: Directed Outreaching Reinforcement Action-Selection

    Authors: Leshem Choshen, Lior Fox, Yonatan Loewenstein

    Abstract: Exploration is a fundamental aspect of Reinforcement Learning, typically implemented using stochastic action-selection. Exploration, however, can be more efficient if directed toward gaining new world knowledge. Visit-counters have been proven useful both in practice and in theory for directed exploration. However, a major limitation of counters is their locality. While there are a few model-based… ▽ More

    Submitted 11 April, 2018; originally announced April 2018.

    Comments: Final version for ICLR 2018