Skip to main content

Showing 1–15 of 15 results for author: Dua, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3264 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 11 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  2. arXiv:2503.14481  [pdf, other

    cs.LG cs.CL

    Don't lie to your friends: Learning what you know from collaborative self-play

    Authors: Jacob Eisenstein, Reza Aghajani, Adam Fisch, Dheeru Dua, Fantine Huot, Mirella Lapata, Vicky Zayats, Jonathan Berant

    Abstract: To be helpful assistants, AI agents must be aware of their own capabilities and limitations. This includes knowing when to answer from parametric knowledge versus using tools, when to trust tool outputs, and when to abstain or hedge. Such capabilities are hard to teach through supervised fine-tuning because they require constructing examples that reflect the agent's specific capabilities. We there… ▽ More

    Submitted 31 March, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

  3. arXiv:2406.13121  [pdf, other

    cs.CL cs.AI cs.IR

    Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

    Authors: Jinhyuk Lee, Anthony Chen, Zhuyun Dai, Dheeru Dua, Devendra Singh Sachan, Michael Boratko, Yi Luan, Sébastien M. R. Arnold, Vincent Perot, Siddharth Dalmia, Hexiang Hu, Xudong Lin, Panupong Pasupat, Aida Amini, Jeremy R. Cole, Sebastian Riedel, Iftekhar Naim, Ming-Wei Chang, Kelvin Guu

    Abstract: Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for specialized knowledge of tools, provides robust end-to-… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 29 pages. Dataset available at https://github.com/google-deepmind/loft

  4. arXiv:2212.10381  [pdf, other

    cs.CL

    To Adapt or to Annotate: Challenges and Interventions for Domain Adaptation in Open-Domain Question Answering

    Authors: Dheeru Dua, Emma Strubell, Sameer Singh, Pat Verga

    Abstract: Recent advances in open-domain question answering (ODQA) have demonstrated impressive accuracy on standard Wikipedia style benchmarks. However, it is less clear how robust these models are and how well they perform when applied to real-world applications in drastically different domains. While there has been some work investigating how well ODQA models perform when tested for out-of-domain (OOD) g… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

  5. arXiv:2212.04092  [pdf, other

    cs.CL

    Successive Prompting for Decomposing Complex Questions

    Authors: Dheeru Dua, Shivanshu Gupta, Sameer Singh, Matt Gardner

    Abstract: Answering complex questions that require making latent decisions is a challenging task, especially when limited supervision is available. Recent works leverage the capabilities of large language models (LMs) to perform complex question answering in a few-shot setting by demonstrating how to output intermediate rationalizations while solving the complex question in a single pass. We introduce ``Suc… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

  6. arXiv:2110.08246  [pdf, ps, other

    cs.CL

    Tricks for Training Sparse Translation Models

    Authors: Dheeru Dua, Shruti Bhosale, Vedanuj Goswami, James Cross, Mike Lewis, Angela Fan

    Abstract: Multi-task learning with an unbalanced data distribution skews model learning towards high resource tasks, especially when model capacity is fixed and fully shared across all tasks. Sparse scaling architectures, such as BASELayers, provide flexible mechanisms for different tasks to have a variable number of parameters, which can be useful to counterbalance skewed data distributions. We find that t… ▽ More

    Submitted 15 October, 2021; originally announced October 2021.

  7. arXiv:2104.08744  [pdf, other

    cs.CL

    Generative Context Pair Selection for Multi-hop Question Answering

    Authors: Dheeru Dua, Cicero Nogueira dos Santos, Patrick Ng, Ben Athiwaratkun, Bing Xiang, Matt Gardner, Sameer Singh

    Abstract: Compositional reasoning tasks like multi-hop question answering, require making latent decisions to get the final answer, given a question. However, crowdsourced datasets often capture only a slice of the underlying task distribution, which can induce unanticipated biases in models performing compositional reasoning. Furthermore, discriminatively trained models exploit such biases to get a better… ▽ More

    Submitted 18 April, 2021; originally announced April 2021.

  8. arXiv:2104.08735  [pdf, other

    cs.CL

    Learning with Instance Bundles for Reading Comprehension

    Authors: Dheeru Dua, Pradeep Dasigi, Sameer Singh, Matt Gardner

    Abstract: When training most modern reading comprehension models, all the questions associated with a context are treated as being independent from each other. However, closely related questions and their corresponding answers are not independent, and leveraging these relationships could provide a strong supervision signal to a model. Drawing on ideas from contrastive estimation, we introduce several new su… ▽ More

    Submitted 18 April, 2021; originally announced April 2021.

  9. arXiv:2010.06694  [pdf, other

    cs.HC

    Easy, Reproducible and Quality-Controlled Data Collection with Crowdaq

    Authors: Qiang Ning, Hao Wu, Pradeep Dasigi, Dheeru Dua, Matt Gardner, Robert L. Logan IV, Ana Marasovic, Zhen Nie

    Abstract: High-quality and large-scale data are key to success for AI systems. However, large-scale data annotation efforts are often confronted with a set of common challenges: (1) designing a user-friendly annotation interface; (2) training enough annotators efficiently; and (3) reproducibility. To address these problems, we introduce Crowdaq, an open-source platform that standardizes the data collection… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

    Comments: Accepted to the demo track of EMNLP 2020

  10. arXiv:2004.02709  [pdf, other

    cs.CL

    Evaluating Models' Local Decision Boundaries via Contrast Sets

    Authors: Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, Ally Zhang , et al. (1 additional authors not shown)

    Abstract: Standard test sets for supervised learning evaluate in-distribution generalization. Unfortunately, when a dataset has systematic gaps (e.g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities. We propose a new annotation paradigm for NLP that helps to close systemati… ▽ More

    Submitted 1 October, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

  11. arXiv:1912.12598  [pdf, ps, other

    cs.CL

    ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine Reading Comprehension

    Authors: Dheeru Dua, Ananth Gottumukkala, Alon Talmor, Sameer Singh, Matt Gardner

    Abstract: Reading comprehension is one of the crucial tasks for furthering research in natural language understanding. A lot of diverse reading comprehension datasets have recently been introduced to study various phenomena in natural language, ranging from simple paraphrase matching and entity typing to entity tracking and understanding the implications of the context. Given the availability of many such d… ▽ More

    Submitted 29 December, 2019; originally announced December 2019.

  12. arXiv:1904.03111  [pdf, other

    cs.CL

    PoMo: Generating Entity-Specific Post-Modifiers in Context

    Authors: Jun Seok Kang, Robert L. Logan IV, Zewei Chu, Yang Chen, Dheeru Dua, Kevin Gimpel, Sameer Singh, Niranjan Balasubramanian

    Abstract: We introduce entity post-modifier generation as an instance of a collaborative writing task. Given a sentence about a target entity, the task is to automatically generate a post-modifier phrase that provides contextually relevant information about the entity. For example, for the sentence, "Barack Obama, _______, supported the #MeToo movement.", the phrase "a father of two girls" is a contextually… ▽ More

    Submitted 8 April, 2019; v1 submitted 5 April, 2019; originally announced April 2019.

    Comments: NAACL-HLT 2019

  13. arXiv:1903.00161  [pdf, other

    cs.CL

    DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs

    Authors: Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, Matt Gardner

    Abstract: Reading comprehension has recently seen rapid progress, with systems matching humans on the most popular datasets for the task. However, a large body of work has highlighted the brittleness of these systems, showing that there is much work left to be done. We introduce a new English reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs. In this cro… ▽ More

    Submitted 16 April, 2019; v1 submitted 1 March, 2019; originally announced March 2019.

  14. arXiv:1710.11342  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Generating Natural Adversarial Examples

    Authors: Zhengli Zhao, Dheeru Dua, Sameer Singh

    Abstract: Due to their complex nature, it is hard to characterize the ways in which machine learning models can misbehave or be exploited when deployed. Recent work on adversarial examples, i.e. inputs with minor perturbations that result in substantially different model predictions, is helpful in evaluating the robustness of these models by exposing the adversarial scenarios where they fail. However, these… ▽ More

    Submitted 23 February, 2018; v1 submitted 31 October, 2017; originally announced October 2017.

    Comments: Published as a conference paper at the International Conference on Learning Representations (ICLR) 2018

  15. arXiv:1412.6150  [pdf

    cs.NI cs.CR

    Selective Watchdog Technique for Intrusion Detection in Mobile Ad-Hoc Network

    Authors: Deepika Dua, Atul Mishra

    Abstract: Mobile ad-hoc networks(MANET) is the collection of mobile nodes which are self organizing and are connected by wireless links where nodes which are not in the direct range communicate with each other relying on the intermediate nodes. As a result of trusting other nodes in the route, a malicious node can easily compromise the security of the network. A black-hole node is the malicious node which d… ▽ More

    Submitted 13 October, 2014; originally announced December 2014.

    Comments: 12 pages,8 figures

    Journal ref: International Journal on Applications of Graph Theory in Wireless Ad hoc Networks and Sensor Networks (GRAPH-HOC) Vol.6, No.3, September 2014