Skip to main content

Showing 1–7 of 7 results for author: Mudgal, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1326 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 9 May, 2025; v1 submitted 18 December, 2023; originally announced December 2023.

  2. arXiv:2310.17022  [pdf, other

    cs.LG cs.AI cs.CL

    Controlled Decoding from Language Models

    Authors: Sidharth Mudgal, Jong Lee, Harish Ganapathy, YaGuang Li, Tao Wang, Yanping Huang, Zhifeng Chen, Heng-Tze Cheng, Michael Collins, Trevor Strohman, Jilin Chen, Alex Beutel, Ahmad Beirami

    Abstract: KL-regularized reinforcement learning (RL) is a popular alignment framework to control the language model responses towards high reward outcomes. We pose a tokenwise RL objective and propose a modular solver for it, called controlled decoding (CD). CD exerts control through a separate prefix scorer module, which is trained to learn a value function for the reward. The prefix scorer is used at infe… ▽ More

    Submitted 3 June, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: ICML 2024

  3. arXiv:2203.08122  [pdf, other

    cs.CV

    From 2D to 3D: Re-thinking Benchmarking of Monocular Depth Prediction

    Authors: Evin Pınar Örnek, Shristi Mudgal, Johanna Wald, Yida Wang, Nassir Navab, Federico Tombari

    Abstract: There have been numerous recently proposed methods for monocular depth prediction (MDP) coupled with the equally rapid evolution of benchmarking tools. However, we argue that MDP is currently witnessing benchmark over-fitting and relying on metrics that are only partially helpful to gauge the usefulness of the predictions for 3D applications. This limits the design and development of novel methods… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

  4. arXiv:2107.04512  [pdf, other

    cs.CL cs.LG

    Using Machine Translation to Localize Task Oriented NLG Output

    Authors: Scott Roy, Cliff Brunk, Kyu-Young Kim, Justin Zhao, Markus Freitag, Mihir Kale, Gagan Bansal, Sidharth Mudgal, Chris Varano

    Abstract: One of the challenges in a task oriented natural language application like the Google Assistant, Siri, or Alexa is to localize the output to many languages. This paper explores doing this by applying machine translation to the English output. Using machine translation is very scalable, as it can work with any English output and can handle dynamic text, but otherwise the problem is a poor fit. The… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: 12 pages, 10 figures

  5. arXiv:2010.12251  [pdf, other

    cs.CL

    A scalable framework for learning from implicit user feedback to improve natural language understanding in large-scale conversational AI systems

    Authors: Sunghyun Park, Han Li, Ameen Patel, Sidharth Mudgal, Sungjin Lee, Young-Bum Kim, Spyros Matsoukas, Ruhi Sarikaya

    Abstract: Natural Language Understanding (NLU) is an established component within a conversational AI or digital assistant system, and it is responsible for producing semantic understanding of a user request. We propose a scalable and automatic approach for improving NLU in a large-scale conversational AI system by leveraging implicit user feedback, with an insight that user interaction data and dialog cont… ▽ More

    Submitted 10 September, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: EMNLP 2021

    ACM Class: I.2.7; I.2.1

  6. arXiv:1905.00921  [pdf, other

    cs.LG cs.AI stat.ML

    Continuous Learning for Large-scale Personalized Domain Classification

    Authors: Han Li, Jihwan Lee, Sidharth Mudgal, Ruhi Sarikaya, Young-Bum Kim

    Abstract: Domain classification is the task of mapping spoken language utterances to one of the natural language understanding domains in intelligent personal digital assistants (IPDAs). This is a major component in mainstream IPDAs in industry. Apart from official domains, thousands of third-party domains are also created by external developers to enhance the capability of IPDAs. As more domains are develo… ▽ More

    Submitted 2 May, 2019; originally announced May 2019.

    Comments: NAACL-HLT 2019

  7. arXiv:1809.04259  [pdf, ps, other

    cs.CL cs.LG

    Generalizing Word Embeddings using Bag of Subwords

    Authors: Jinman Zhao, Sidharth Mudgal, Yingyu Liang

    Abstract: We approach the problem of generalizing pre-trained word embeddings beyond fixed-size vocabularies without using additional contextual information. We propose a subword-level word vector generation model that views words as bags of character $n$-grams. The model is simple, fast to train and provides good vectors for rare or unseen words. Experiments show that our model achieves state-of-the-art pe… ▽ More

    Submitted 12 September, 2018; originally announced September 2018.

    Comments: Accepted to EMNLP 2018