Skip to main content

Showing 1–9 of 9 results for author: Garcia-Olano, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.15323  [pdf, other

    cs.CL

    Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack

    Authors: Silvia Cappelletti, Tobia Poppi, Samuele Poppi, Zheng-Xin Yong, Diego Garcia-Olano, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

    Abstract: Large Language Models (LLMs) are increasingly evaluated on multiple-choice question answering (MCQA) tasks using *first-token probability* (FTP), which selects the answer option whose initial token has the highest likelihood. While efficient, FTP can be fragile: models may assign high probability to unrelated tokens (*misalignment*) or use a valid token merely as part of a generic preamble rather… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 13 pages, 5 figures, 7 tables

  2. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere , et al. (536 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 23 November, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  3. arXiv:2312.05491  [pdf, other

    cs.CL cs.AI

    Using Captum to Explain Generative Language Models

    Authors: Vivek Miglani, Aobo Yang, Aram H. Markosyan, Diego Garcia-Olano, Narine Kokhlikyan

    Abstract: Captum is a comprehensive library for model explainability in PyTorch, offering a range of methods from the interpretability literature to enhance users' understanding of PyTorch models. In this paper, we introduce new features in Captum that are specifically designed to analyze the behavior of generative language models. We provide an overview of the available functionalities and example applicat… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    ACM Class: I.2.7

  4. arXiv:2312.04712  [pdf, other

    cs.LG

    Error Discovery by Clustering Influence Embeddings

    Authors: Fulton Wang, Julius Adebayo, Sarah Tan, Diego Garcia-Olano, Narine Kokhlikyan

    Abstract: We present a method for identifying groups of test examples -- slices -- on which a model under-performs, a task now known as slice discovery. We formalize coherence -- a requirement that erroneous predictions, within a slice, should be wrong for the same reason -- as a key property that any slice discovery method should satisfy. We then use influence functions to derive a new slice discovery meth… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: NeuRIPs 2023 conference paper

  5. arXiv:2212.01641  [pdf, other

    cs.CL cs.LG

    Intermediate Entity-based Sparse Interpretable Representation Learning

    Authors: Diego Garcia-Olano, Yasumasa Onoe, Joydeep Ghosh, Byron C. Wallace

    Abstract: Interpretable entity representations (IERs) are sparse embeddings that are "human-readable" in that dimensions correspond to fine-grained entity types and values are predicted probabilities that a given entity is of the corresponding type. These methods perform well in zero-shot and low supervision settings. Compared to standard dense neural embeddings, such interpretable representations may permi… ▽ More

    Submitted 3 December, 2022; originally announced December 2022.

    Comments: Accepted into BlackBox NLP Workshop at EMNLP 2022

  6. arXiv:2112.06888  [pdf, other

    cs.CL cs.CV cs.LG

    Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection

    Authors: Diego Garcia-Olano, Yasumasa Onoe, Joydeep Ghosh

    Abstract: Knowledge-Based Visual Question Answering (KBVQA) is a bi-modal task requiring external world knowledge in order to correctly answer a text question and associated image. Recent single modality text work has shown knowledge injection into pre-trained language models, specifically entity enhanced knowledge graph embeddings, can improve performance on downstream entity-centric tasks. In this work, w… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

    Journal ref: Proceedings of the 1st International Workshop on Multimodal Understanding for the Web and Social Media, co-located with the Web Conference 2022 (WWW '22 Companion), April 25--29, 2022, Virtual Event, Lyon, France

  7. arXiv:2106.09502  [pdf, other

    cs.CL cs.LG

    Biomedical Interpretable Entity Representations

    Authors: Diego Garcia-Olano, Yasumasa Onoe, Ioana Baldini, Joydeep Ghosh, Byron C. Wallace, Kush R. Varshney

    Abstract: Pre-trained language models induce dense entity representations that offer strong performance on entity-centric NLP tasks, but such representations are not immediately interpretable. This can be a barrier to model uptake in important domains such as biomedicine. There has been recent work on general interpretable representation learning (Onoe and Durrett, 2020), but these domain-agnostic represent… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: Accepted into Findings of ACL-IJCNLP 2021

  8. arXiv:1909.10506  [pdf, other

    cs.CL cs.IR cs.LG

    Learning Dense Representations for Entity Retrieval

    Authors: Daniel Gillick, Sayali Kulkarni, Larry Lansing, Alessandro Presta, Jason Baldridge, Eugene Ie, Diego Garcia-Olano

    Abstract: We show that it is feasible to perform entity linking by training a dual encoder (two-tower) model that encodes mentions and entities in the same dense vector space, where candidate entities are retrieved by approximate nearest neighbor search. Unlike prior work, this setup does not rely on an alias table followed by a re-ranker, and is thus the first fully learned entity retrieval model. We show… ▽ More

    Submitted 23 September, 2019; originally announced September 2019.

    Comments: CoNLL 2019

  9. arXiv:1904.08935  [pdf, other

    cs.LG cs.AI stat.ML

    Explaining Deep Classification of Time-Series Data with Learned Prototypes

    Authors: Alan H. Gee, Diego Garcia-Olano, Joydeep Ghosh, David Paydarfar

    Abstract: The emergence of deep learning networks raises a need for explainable AI so that users and domain experts can be confident applying them to high-risk decisions. In this paper, we leverage data from the latent space induced by deep learning models to learn stereotypical representations or "prototypes" during training to elucidate the algorithmic decision-making process. We study how leveraging prot… ▽ More

    Submitted 4 September, 2019; v1 submitted 18 April, 2019; originally announced April 2019.

    Comments: The first two authors contributed equally. Accepted May 20, Presented Jun 14, 2019 at the ICML Time-series Workshop in Long Beach, CA, USA. Accepted June 15, Presented Aug 11, 2019 at the IJCAI Workshop on Knowledge Discovery in Healthcare Data in Macao, China. Formal proceedings available in the CEUR Workshop Proceedings (http://ceur-ws.org/Vol-2429/)

    Journal ref: Proceedings of the 4th International Workshop on Knowledge Discovery in Healthcare Data, co-located with the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019)