Skip to main content

Showing 1–4 of 4 results for author: Dhekane, E G

Searching in archive stat. Search in all archives.
.
  1. arXiv:2505.20295  [pdf, ps, other

    cs.CL cs.AI cs.LG stat.ML

    Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution?

    Authors: Michael Kirchhof, Luca Füger, Adam Goliński, Eeshan Gunesh Dhekane, Arno Blaas, Sinead Williamson

    Abstract: To reveal when a large language model (LLM) is uncertain about a response, uncertainty quantification commonly produces percentage numbers along with the output. But is this all we can do? We argue that in the output space of LLMs, the space of strings, exist strings expressive enough to summarize the distribution over output strings the LLM deems possible. We lay a foundation for this new avenue… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  2. arXiv:2403.05490  [pdf, other

    cs.LG cs.AI cs.CV cs.IT stat.ML

    Poly-View Contrastive Learning

    Authors: Amitis Shidani, Devon Hjelm, Jason Ramapuram, Russ Webb, Eeshan Gunesh Dhekane, Dan Busbridge

    Abstract: Contrastive learning typically matches pairs of related views among a number of unrelated negative views. Views can be generated (e.g. by augmentations) or be observed. We investigate matching when there are more than two related views which we call poly-view tasks, and derive new representation learning objectives using information maximization and sufficient statistics. We show that with unlimit… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Accepted to ICLR 2024. 42 pages, 7 figures, 3 tables, loss pseudo-code included in appendix

  3. arXiv:2307.13813  [pdf, other

    stat.ML cs.AI cs.LG

    How to Scale Your EMA

    Authors: Dan Busbridge, Jason Ramapuram, Pierre Ablin, Tatiana Likhomanenko, Eeshan Gunesh Dhekane, Xavier Suau, Russ Webb

    Abstract: Preserving training dynamics across batch sizes is an important tool for practical machine learning as it enables the trade-off between batch size and wall-clock time. This trade-off is typically enabled by a scaling rule, for example, in stochastic gradient descent, one should scale the learning rate linearly with the batch size. Another important machine learning tool is the model EMA, a functio… ▽ More

    Submitted 7 November, 2023; v1 submitted 25 July, 2023; originally announced July 2023.

    Comments: Spotlight at NeurIPS 2023, 53 pages, 32 figures, 17 tables

  4. arXiv:1906.03574  [pdf, other

    cs.LG cs.AI stat.ML

    Transfer Learning by Modeling a Distribution over Policies

    Authors: Disha Shrivastava, Eeshan Gunesh Dhekane, Riashat Islam

    Abstract: Exploration and adaptation to new tasks in a transfer learning setup is a central challenge in reinforcement learning. In this work, we build on the idea of modeling a distribution over policies in a Bayesian deep reinforcement learning setup to propose a transfer strategy. Recent works have shown to induce diversity in the learned policies by maximizing the entropy of a distribution of policies (… ▽ More

    Submitted 9 June, 2019; originally announced June 2019.

    Comments: Accepted at the ICML 2019 workshop on Multi-Task and Lifelong Reinforcement Learning