Skip to main content

Showing 1–50 of 79 results for author: Bosselut, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.06219  [pdf, other

    cs.CL cs.LG

    Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs

    Authors: Dongyang Fan, Vinko Sabolčec, Matin Ansaripour, Ayush Kumar Tarun, Martin Jaggi, Antoine Bosselut, Imanol Schlag

    Abstract: The increasing adoption of web crawling opt-outs by copyright holders of online content raises critical questions about the impact of data compliance on large language model (LLM) performance. However, little is known about how these restrictions (and the resultant filtering of pretraining datasets) affect the capabilities of models trained using these corpora. In this work, we conceptualize this… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

  2. arXiv:2503.20871  [pdf, other

    cs.CV cs.AI cs.CL

    VinaBench: Benchmark for Faithful and Consistent Visual Narratives

    Authors: Silin Gao, Sheryl Mathew, Li Mi, Sepideh Mamooler, Mengjie Zhao, Hiromi Wakaki, Yuki Mitsufuji, Syrielle Montariol, Antoine Bosselut

    Abstract: Visual narrative generation transforms textual narratives into sequences of images illustrating the content of the text. However, generating visual narratives that are faithful to the input text and self-consistent across generated images remains an open challenge, due to the lack of knowledge constraints used for planning the stories. In this work, we propose a new benchmark, VinaBench, to addres… ▽ More

    Submitted 3 April, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

    Comments: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025)

  3. arXiv:2503.01830  [pdf, other

    cs.CL

    From Language to Cognition: How LLMs Outgrow the Human Language Network

    Authors: Badr AlKhamissi, Greta Tuckute, Yingtian Tang, Taha Binhuraib, Antoine Bosselut, Martin Schrimpf

    Abstract: Large language models (LLMs) exhibit remarkable similarity to neural activity in the human language network. However, the key properties of language shaping brain-like representations, and their evolution during training as a function of different tasks remain unclear. We here benchmark 34 training checkpoints spanning 300B tokens across 8 different model sizes to analyze how brain alignment relat… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: Preprint

  4. arXiv:2501.04671  [pdf, other

    cs.CV cs.AI

    Retrieval-Based Interleaved Visual Chain-of-Thought in Real-World Driving Scenarios

    Authors: Charles Corbière, Simon Roburin, Syrielle Montariol, Antoine Bosselut, Alexandre Alahi

    Abstract: While chain-of-thought (CoT) prompting improves reasoning in large language models, its effectiveness in vision-language models (VLMs) remains limited due to over-reliance on textual cues and memorized knowledge. To investigate the visual reasoning capabilities of VLMs in complex real-world scenarios, we introduce DrivingVQA, a visual question answering dataset derived from driving theory exams, w… ▽ More

    Submitted 8 April, 2025; v1 submitted 8 January, 2025; originally announced January 2025.

    Comments: Project page: https://vita-epfl.github.io/DrivingVQA

  5. arXiv:2412.11923  [pdf, other

    cs.CL cs.AI

    PICLe: Pseudo-Annotations for In-Context Learning in Low-Resource Named Entity Detection

    Authors: Sepideh Mamooler, Syrielle Montariol, Alexander Mathis, Antoine Bosselut

    Abstract: In-context learning (ICL) enables Large Language Models (LLMs) to perform tasks using few demonstrations, facilitating task adaptation when labeled examples are hard to obtain. However, ICL is sensitive to the choice of demonstrations, and it remains unclear which demonstration attributes enable in-context generalization. In this work, we conduct a perturbation study of in-context demonstrations f… ▽ More

    Submitted 1 April, 2025; v1 submitted 16 December, 2024; originally announced December 2024.

    Comments: In Proceedings of NAACL2025

  6. arXiv:2412.03304  [pdf, other

    cs.CL

    Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

    Authors: Shivalika Singh, Angelika Romanou, Clémentine Fourrier, David I. Adelani, Jian Gang Ngui, Daniel Vila-Suero, Peerat Limkonchotiwat, Kelly Marchisio, Wei Qi Leong, Yosephine Susanto, Raymond Ng, Shayne Longpre, Wei-Yin Ko, Sebastian Ruder, Madeline Smith, Antoine Bosselut, Alice Oh, Andre F. T. Martins, Leshem Choshen, Daphne Ippolito, Enzo Ferrante, Marzieh Fadaee, Beyza Ermis, Sara Hooker

    Abstract: Cultural biases in multilingual datasets pose significant challenges for their effectiveness as global benchmarks. These biases stem not only from differences in language but also from the cultural knowledge required to interpret questions, reducing the practical utility of translated datasets like MMLU. Furthermore, translation often introduces artefacts that can distort the meaning or clarity of… ▽ More

    Submitted 19 February, 2025; v1 submitted 4 December, 2024; originally announced December 2024.

  7. arXiv:2411.19799  [pdf, other

    cs.CL

    INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge

    Authors: Angelika Romanou, Negar Foroutan, Anna Sotnikova, Zeming Chen, Sree Harsha Nelaturu, Shivalika Singh, Rishabh Maheshwary, Micol Altomare, Mohamed A. Haggag, Snegha A, Alfonso Amayuelas, Azril Hafizi Amirudin, Viraat Aryabumi, Danylo Boiko, Michael Chang, Jenny Chim, Gal Cohen, Aditya Kumar Dalmia, Abraham Diress, Sharad Duwal, Daniil Dzenhaliou, Daniel Fernando Erazo Florez, Fabian Farestam, Joseph Marvin Imperial, Shayekh Bin Islam , et al. (34 additional authors not shown)

    Abstract: The performance differential of large language models (LLM) between languages hinders their effective deployment in many regions, inhibiting the potential economic and societal value of generative AI tools in many communities. However, the development of functional LLMs in many languages (\ie, multilingual LLMs) is bottlenecked by the lack of high-quality evaluation resources in languages other th… ▽ More

    Submitted 29 November, 2024; originally announced November 2024.

  8. arXiv:2411.02280  [pdf, other

    cs.CL cs.LG

    The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units

    Authors: Badr AlKhamissi, Greta Tuckute, Antoine Bosselut, Martin Schrimpf

    Abstract: Large language models (LLMs) exhibit remarkable capabilities on not just language tasks, but also various tasks that are not linguistic in nature, such as logical reasoning and social inference. In the human brain, neuroscience has identified a core language system that selectively and causally supports language processing. We here ask whether similar specialization for language emerges in LLMs. W… ▽ More

    Submitted 13 February, 2025; v1 submitted 4 November, 2024; originally announced November 2024.

    Comments: NAACL 2025

  9. arXiv:2410.17218  [pdf, other

    cs.AI cs.CL

    Creativity in AI: Progresses and Challenges

    Authors: Mete Ismayilzada, Debjit Paul, Antoine Bosselut, Lonneke van der Plas

    Abstract: Creativity is the ability to produce novel, useful, and surprising ideas, and has been widely studied as a crucial aspect of human cognition. Machine creativity on the other hand has been a long-standing challenge. With the rise of advanced generative AI, there has been renewed interest and debate regarding AI's creative capabilities. Therefore, it is imperative to revisit the state of creativity… ▽ More

    Submitted 9 December, 2024; v1 submitted 22 October, 2024; originally announced October 2024.

    Comments: minor updates to content + figures

  10. arXiv:2410.12656  [pdf, other

    cs.CL cs.AI

    Evaluating Morphological Compositional Generalization in Large Language Models

    Authors: Mete Ismayilzada, Defne Circi, Jonne Sälevä, Hale Sirin, Abdullatif Köksal, Bhuwan Dhingra, Antoine Bosselut, Duygu Ataman, Lonneke van der Plas

    Abstract: Large language models (LLMs) have demonstrated significant progress in various natural language generation and understanding tasks. However, their linguistic generalization capabilities remain questionable, raising doubts about whether these models learn language similarly to humans. While humans exhibit compositional generalization and linguistic creativity in language use, the extent to which LL… ▽ More

    Submitted 9 February, 2025; v1 submitted 16 October, 2024; originally announced October 2024.

    Comments: Accepted to NAACL 2025

  11. arXiv:2410.05362  [pdf, other

    cs.CL cs.AI cs.LG

    LLMs Are In-Context Bandit Reinforcement Learners

    Authors: Giovanni Monea, Antoine Bosselut, Kianté Brantley, Yoav Artzi

    Abstract: Large Language Models (LLMs) excel at in-context learning (ICL), a supervised learning technique that relies on adding annotated examples to the model context. We investigate a contextual bandit version of in-context reinforcement learning (ICRL), where models learn in-context, online, from external reward, instead of supervised data. We show that LLMs effectively demonstrate such learning, and pr… ▽ More

    Submitted 31 January, 2025; v1 submitted 7 October, 2024; originally announced October 2024.

  12. arXiv:2408.11841  [pdf, other

    cs.CY cs.AI cs.CL

    Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants

    Authors: Beatriz Borges, Negar Foroutan, Deniz Bayazit, Anna Sotnikova, Syrielle Montariol, Tanya Nazaretzky, Mohammadreza Banaei, Alireza Sakhaeirad, Philippe Servant, Seyed Parsa Neshaei, Jibril Frej, Angelika Romanou, Gail Weiss, Sepideh Mamooler, Zeming Chen, Simin Fan, Silin Gao, Mete Ismayilzada, Debjit Paul, Alexandre Schöpfer, Andrej Janchevski, Anja Tiede, Clarence Linden, Emanuele Troiani, Francesco Salvi , et al. (65 additional authors not shown)

    Abstract: AI assistants are being increasingly used by students enrolled in higher education institutions. While these tools provide opportunities for improved teaching and education, they also pose significant challenges for assessment and learning outcomes. We conceptualize these challenges through the lens of vulnerability, the potential for university assessments and learning outcomes to be impacted by… ▽ More

    Submitted 27 November, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

    Comments: 20 pages, 8 figures

    Journal ref: PNAS (2024) Vol. 121 | No. 49

  13. arXiv:2408.03618  [pdf, other

    cs.CL cs.AI cs.LG

    A Logical Fallacy-Informed Framework for Argument Generation

    Authors: Luca Mouchel, Debjit Paul, Shaobo Cui, Robert West, Antoine Bosselut, Boi Faltings

    Abstract: Despite the remarkable performance of Large Language Models (LLMs) in natural language processing tasks, they still struggle with generating logically sound arguments, resulting in potential risks such as spreading misinformation. To address this issue, we introduce FIPO, a fallacy-informed framework that leverages preference optimization methods to steer LLMs toward logically sound arguments. FIP… ▽ More

    Submitted 3 May, 2025; v1 submitted 7 August, 2024; originally announced August 2024.

  14. arXiv:2406.15109  [pdf, other

    cs.CL cs.LG

    Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network

    Authors: Badr AlKhamissi, Greta Tuckute, Antoine Bosselut, Martin Schrimpf

    Abstract: Large Language Models (LLMs) have been shown to be effective models of the human language system, with some models predicting most explainable variance of brain activity in current datasets. Even in untrained models, the representations induced by architectural priors can exhibit reasonable alignment to brain data. In this work, we investigate the key architectural components driving the surprisin… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Preprint

  15. arXiv:2406.11228  [pdf, other

    cs.CL

    ComperDial: Commonsense Persona-grounded Dialogue Dataset and Benchmark

    Authors: Hiromi Wakaki, Yuki Mitsufuji, Yoshinori Maeda, Yukiko Nishimura, Silin Gao, Mengjie Zhao, Keiichi Yamada, Antoine Bosselut

    Abstract: We propose a new benchmark, ComperDial, which facilitates the training and evaluation of evaluation metrics for open-domain dialogue systems. ComperDial consists of human-scored responses for 10,395 dialogue turns in 1,485 conversations collected from 99 dialogue agents submitted to the Commonsense Persona-grounded Dialogue (CPD) challenge. As a result, for any dialogue, our benchmark includes mul… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  16. arXiv:2406.07222  [pdf, other

    cs.CL cs.AI cs.LG

    Improving Autoformalization using Type Checking

    Authors: Auguste Poiroux, Gail Weiss, Viktor Kunčak, Antoine Bosselut

    Abstract: Autoformalization, the automatic translation of unconstrained natural language into formal languages, has garnered significant attention due to its potential applications in theorem proving, formal verification, and LLM output checking. In this work, we analyze both current autoformalization methods and the processes used to evaluate them, focusing specifically on the Lean 4 theorem proving langua… ▽ More

    Submitted 11 February, 2025; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: New benchmarks released, see https://github.com/augustepoiroux/RLMEval , https://huggingface.co/datasets/PAug/ProofNetSharp , and https://huggingface.co/datasets/PAug/ProofNetVerif . For code, see https://github.com/augustepoiroux/LeanInteract

  17. Course Recommender Systems Need to Consider the Job Market

    Authors: Jibril Frej, Anna Dai, Syrielle Montariol, Antoine Bosselut, Tanja Käser

    Abstract: Current course recommender systems primarily leverage learner-course interactions, course content, learner preferences, and supplementary course details like instructor, institution, ratings, and reviews, to make their recommendation. However, these systems often overlook a critical aspect: the evolving skill demand of the job market. This paper focuses on the perspective of academic researchers,… ▽ More

    Submitted 1 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: accepted at SIGIR 2024 as a perspective paper. Camera Ready will come soon

    ACM Class: H.3.3

  18. A Design Space for Intelligent and Interactive Writing Assistants

    Authors: Mina Lee, Katy Ilonka Gero, John Joon Young Chung, Simon Buckingham Shum, Vipul Raheja, Hua Shen, Subhashini Venugopalan, Thiemo Wambsganss, David Zhou, Emad A. Alghamdi, Tal August, Avinash Bhat, Madiha Zahrah Choksi, Senjuti Dutta, Jin L. C. Guo, Md Naimul Hoque, Yewon Kim, Simon Knight, Seyed Parsa Neshaei, Agnia Sergeyuk, Antonette Shibani, Disha Shrivastava, Lila Shroff, Jessi Stark, Sarah Sterman , et al. (11 additional authors not shown)

    Abstract: In our era of rapid technological advancement, the research landscape for writing assistants has become increasingly fragmented across various research communities. We seek to address this challenge by proposing a design space as a structured way to examine and explore the multidimensional space of intelligent and interactive writing assistants. Through a large community collaboration, we explore… ▽ More

    Submitted 26 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: Published as a conference paper at CHI 2024

  19. arXiv:2403.13965  [pdf, other

    cs.CV

    ConGeo: Robust Cross-view Geo-localization across Ground View Variations

    Authors: Li Mi, Chang Xu, Javiera Castillo-Navarro, Syrielle Montariol, Wen Yang, Antoine Bosselut, Devis Tuia

    Abstract: Cross-view geo-localization aims at localizing a ground-level query image by matching it to its corresponding geo-referenced aerial view. In real-world scenarios, the task requires accommodating diverse ground images captured by users with varying orientations and reduced field of views (FoVs). However, existing learning pipelines are orientation-specific or FoV-specific, demanding separate model… ▽ More

    Submitted 4 September, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: ECCV2024. Project page at https://eceo-epfl.github.io/ConGeo/

  20. arXiv:2403.07398  [pdf, other

    cs.CL cs.AI

    Complex Reasoning over Logical Queries on Commonsense Knowledge Graphs

    Authors: Tianqing Fang, Zeming Chen, Yangqiu Song, Antoine Bosselut

    Abstract: Event commonsense reasoning requires the ability to reason about the relationship between events, as well as infer implicit context underlying that relationship. However, data scarcity makes it challenging for language models to learn to generate commonsense inferences for contexts and questions involving interactions between complex events. To address this demand, we present COM2 (COMplex COMmons… ▽ More

    Submitted 22 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: ACL 2024

  21. arXiv:2403.00180  [pdf, other

    cs.CL

    "Flex Tape Can't Fix That": Bias and Misinformation in Edited Language Models

    Authors: Karina Halevy, Anna Sotnikova, Badr AlKhamissi, Syrielle Montariol, Antoine Bosselut

    Abstract: Model editing has emerged as a cost-effective strategy to update knowledge stored in language models. However, model editing can have unintended consequences after edits are applied: information unrelated to the edits can also be changed, and other general behaviors of the model can be wrongly altered. In this work, we investigate how model editing methods unexpectedly amplify model biases post-ed… ▽ More

    Submitted 3 October, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

    Comments: Accepted to EMNLP 2024 Main. 9 pages, 4 figures

  22. arXiv:2402.17011  [pdf, other

    cs.CL

    DiffuCOMET: Contextual Commonsense Knowledge Diffusion

    Authors: Silin Gao, Mete Ismayilzada, Mengjie Zhao, Hiromi Wakaki, Yuki Mitsufuji, Antoine Bosselut

    Abstract: Inferring contextually-relevant and diverse commonsense to understand narratives remains challenging for knowledge models. In this work, we develop a series of knowledge models, DiffuCOMET, that leverage diffusion to learn to reconstruct the implicit semantic connections between narrative contexts and relevant commonsense knowledge. Across multiple diffusion steps, our method progressively refines… ▽ More

    Submitted 1 October, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  23. arXiv:2402.13950  [pdf, other

    cs.CL

    Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning

    Authors: Debjit Paul, Robert West, Antoine Bosselut, Boi Faltings

    Abstract: Large language models (LLMs) have been shown to perform better when asked to reason step-by-step before answering a question. However, it is unclear to what degree the model's final answer is faithful to the stated reasoning steps. In this paper, we perform a causal mediation analysis on twelve LLMs to examine how intermediate reasoning steps generated by the LLM influence the final outcome and fi… ▽ More

    Submitted 6 October, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: Accepted at EMNLP Findings

  24. arXiv:2402.12846  [pdf, other

    cs.CV cs.AI

    ConVQG: Contrastive Visual Question Generation with Multimodal Guidance

    Authors: Li Mi, Syrielle Montariol, Javiera Castillo-Navarro, Xianjie Dai, Antoine Bosselut, Devis Tuia

    Abstract: Asking questions about visual environments is a crucial way for intelligent agents to understand rich multi-faceted scenes, raising the importance of Visual Question Generation (VQG) systems. Apart from being grounded to the image, existing VQG systems can use textual constraints, such as expected answers or knowledge triplets, to generate focused questions. These constraints allow VQG systems to… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: AAAI 2024. Project page at https://limirs.github.io/ConVQG

  25. arXiv:2402.03832  [pdf, other

    cs.CL

    Rethinking Skill Extraction in the Job Market Domain using Large Language Models

    Authors: Khanh Cao Nguyen, Mike Zhang, Syrielle Montariol, Antoine Bosselut

    Abstract: Skill Extraction involves identifying skills and qualifications mentioned in documents such as job postings and resumes. The task is commonly tackled by training supervised models using a sequence labeling approach with BIO tags. However, the reliance on manually annotated data limits the generalizability of such approaches. Moreover, the common BIO setting limits the ability of the models to capt… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: Published at NLP4HR 2024 (EACL Workshop)

  26. arXiv:2402.03242  [pdf, other

    cs.CL

    JOBSKAPE: A Framework for Generating Synthetic Job Postings to Enhance Skill Matching

    Authors: Antoine Magron, Anna Dai, Mike Zhang, Syrielle Montariol, Antoine Bosselut

    Abstract: Recent approaches in skill matching, employing synthetic training data for classification or similarity model training, have shown promising results, reducing the need for time-consuming and expensive annotations. However, previous synthetic datasets have limitations, such as featuring only one skill per sentence and generally comprising short sentences. In this paper, we introduce JobSkape, a fra… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: Published at NLP4HR 2024 (EACL Workshop)

  27. arXiv:2401.17464  [pdf, other

    cs.CL

    Efficient Tool Use with Chain-of-Abstraction Reasoning

    Authors: Silin Gao, Jane Dwivedi-Yu, Ping Yu, Xiaoqing Ellen Tan, Ramakanth Pasunuru, Olga Golovneva, Koustuv Sinha, Asli Celikyilmaz, Antoine Bosselut, Tianlu Wang

    Abstract: To achieve faithful reasoning that aligns with human expectations, large language models (LLMs) need to ground their reasoning to real-world knowledge (e.g., web facts, math and physical rules). Tools help LLMs access this external knowledge, but there remains challenges for fine-tuning LLM agents (e.g., Toolformer) to invoke tools in multi-step reasoning problems, where inter-connected tool calls… ▽ More

    Submitted 8 January, 2025; v1 submitted 30 January, 2024; originally announced January 2024.

  28. arXiv:2401.04536  [pdf, other

    cs.CL cs.AI cs.LG

    Evaluating Language Model Agency through Negotiations

    Authors: Tim R. Davidson, Veniamin Veselovsky, Martin Josifoski, Maxime Peyrard, Antoine Bosselut, Michal Kosinski, Robert West

    Abstract: We introduce an approach to evaluate language model (LM) agency using negotiation games. This approach better reflects real-world use cases and addresses some of the shortcomings of alternative LM benchmarks. Negotiation games enable us to study multi-turn, and cross-model interactions, modulate complexity, and side-step accidental evaluation data leakage. We use our approach to test six widely us… ▽ More

    Submitted 16 March, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: Accepted to ICLR 2024, code and link to project data are made available at https://github.com/epfl-dlab/LAMEN

  29. arXiv:2401.03183  [pdf, other

    cs.CL

    Exploring Defeasibility in Causal Reasoning

    Authors: Shaobo Cui, Lazar Milikic, Yiyang Feng, Mete Ismayilzada, Debjit Paul, Antoine Bosselut, Boi Faltings

    Abstract: Defeasibility in causal reasoning implies that the causal relationship between cause and effect can be strengthened or weakened. Namely, the causal strength between cause and effect should increase or decrease with the incorporation of strengthening arguments (supporters) or weakening arguments (defeaters), respectively. However, existing works ignore defeasibility in causal reasoning and fail to… ▽ More

    Submitted 27 June, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

    Comments: Accepted by ACL 2024 (Findings)

  30. arXiv:2312.00575  [pdf, other

    cs.CL

    Instruction-tuning Aligns LLMs to the Human Brain

    Authors: Khai Loong Aw, Syrielle Montariol, Badr AlKhamissi, Martin Schrimpf, Antoine Bosselut

    Abstract: Instruction-tuning is a widely adopted finetuning method that enables large language models (LLMs) to generate output that more closely resembles human responses. However, no studies have shown that instruction-tuning actually teaches LLMs to process language in a similar manner as humans. We investigate the effect of instruction-tuning on aligning LLM and human language processing mechanisms in t… ▽ More

    Submitted 9 August, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: COLM 2024

  31. arXiv:2311.16079  [pdf, other

    cs.CL cs.AI cs.LG

    MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

    Authors: Zeming Chen, Alejandro Hernández Cano, Angelika Romanou, Antoine Bonnet, Kyle Matoba, Francesco Salvi, Matteo Pagliardini, Simin Fan, Andreas Köpf, Amirkeivan Mohtashami, Alexandre Sallinen, Alireza Sakhaeirad, Vinitra Swamy, Igor Krawczuk, Deniz Bayazit, Axel Marmet, Syrielle Montariol, Mary-Anne Hartley, Martin Jaggi, Antoine Bosselut

    Abstract: Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (<= 13B parameters), which restricts their abilities. In this work, we improve access to large-scale medical LLMs by rele… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  32. arXiv:2311.04284  [pdf, other

    cs.CL cs.AI

    CRAB: Assessing the Strength of Causal Relationships Between Real-world Events

    Authors: Angelika Romanou, Syrielle Montariol, Debjit Paul, Leo Laugier, Karl Aberer, Antoine Bosselut

    Abstract: Understanding narratives requires reasoning about the cause-and-effect relationships between events mentioned in the text. While existing foundation models yield impressive results in many NLP tasks requiring reasoning, it is unclear whether they understand the complexity of the underlying network of causal relationships of events in narratives. In this work, we present CRAB, a new Causal Reasonin… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  33. arXiv:2310.15258  [pdf, other

    cs.CL

    Breaking the Language Barrier: Improving Cross-Lingual Reasoning with Structured Self-Attention

    Authors: Negar Foroutan, Mohammadreza Banaei, Karl Aberer, Antoine Bosselut

    Abstract: In this work, we study whether multilingual language models (MultiLMs) can transfer logical reasoning abilities to other languages when they are fine-tuned for reasoning in a different language. We evaluate the cross-lingual reasoning abilities of MultiLMs in two schemes: (1) where the language of the context and the question remain the same in the new languages that are tested (i.e., the reasonin… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 - Findings

  34. arXiv:2310.15239  [pdf, other

    cs.CL cs.AI

    CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks

    Authors: Mete Ismayilzada, Debjit Paul, Syrielle Montariol, Mor Geva, Antoine Bosselut

    Abstract: Recent efforts in natural language processing (NLP) commonsense reasoning research have yielded a considerable number of new datasets and benchmarks. However, most of these datasets formulate commonsense reasoning challenges in artificial scenarios that are not reflective of the tasks which real-world NLP systems are designed to solve. In this work, we present CRoW, a manually-curated, multi-task… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: 37 pages, camera-ready for EMNLP 2023

  35. arXiv:2310.14491  [pdf, other

    cs.CL

    Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models

    Authors: Yifan Hou, Jiaoda Li, Yu Fei, Alessandro Stolfo, Wangchunshu Zhou, Guangtao Zeng, Antoine Bosselut, Mrinmaya Sachan

    Abstract: Recent work has shown that language models (LMs) have strong multi-step (i.e., procedural) reasoning capabilities. However, it is unclear whether LMs perform these tasks by cheating with answers memorized from pretraining corpus, or, via a multi-step reasoning mechanism. In this paper, we try to answer this question by exploring a mechanistic interpretation of LMs for multi-step reasoning tasks. C… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: This work is published in EMNLP 2023

  36. arXiv:2310.03084  [pdf, other

    cs.CL cs.AI cs.LG

    Discovering Knowledge-Critical Subnetworks in Pretrained Language Models

    Authors: Deniz Bayazit, Negar Foroutan, Zeming Chen, Gail Weiss, Antoine Bosselut

    Abstract: Pretrained language models (LMs) encode implicit representations of knowledge in their parameters. However, localizing these representations and disentangling them from each other remains an open problem. In this work, we investigate whether pretrained language models contain various knowledge-critical subnetworks: particular sparse computational subgraphs that can, if removed, precisely suppress… ▽ More

    Submitted 15 October, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: EMNLP 2024

  37. arXiv:2307.00279  [pdf, other

    cs.CL

    Let Me Teach You: Pedagogical Foundations of Feedback for Language Models

    Authors: Beatriz Borges, Niket Tandon, Tanja Käser, Antoine Bosselut

    Abstract: Natural Language Feedback (NLF) is an increasingly popular mechanism for aligning Large Language Models (LLMs) to human preferences. Despite the diversity of the information it can convey, NLF methods are often hand-designed and arbitrary, with little systematic grounding. At the same time, research in learning sciences has long established several effective feedback models. In this opinion piece,… ▽ More

    Submitted 23 October, 2024; v1 submitted 1 July, 2023; originally announced July 2023.

    Comments: EMNLP 2024; 9 pages, 3 figures

  38. arXiv:2305.19148  [pdf, other

    cs.CL cs.AI cs.LG

    Mitigating Label Biases for In-context Learning

    Authors: Yu Fei, Yifan Hou, Zeming Chen, Antoine Bosselut

    Abstract: Various design settings for in-context learning (ICL), such as the choice and order of the in-context examples, can bias a model toward a particular prediction without being reflective of an understanding of the task. While many studies discuss these design choices, there have been few systematic investigations into categorizing them and mitigating their impact. In this work, we define a typology… ▽ More

    Submitted 4 August, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023

  39. arXiv:2305.14869  [pdf, other

    cs.CL

    CAR: Conceptualization-Augmented Reasoner for Zero-Shot Commonsense Question Answering

    Authors: Weiqi Wang, Tianqing Fang, Wenxuan Ding, Baixuan Xu, Xin Liu, Yangqiu Song, Antoine Bosselut

    Abstract: The task of zero-shot commonsense question answering evaluates models on their capacity to reason about general scenarios beyond those presented in specific datasets. Existing approaches for tackling this task leverage external knowledge from CommonSense Knowledge Bases (CSKBs) by pretraining the model on synthetic QA pairs constructed from CSKBs. In these approaches, negative examples (distractor… ▽ More

    Submitted 20 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Findings of EMNLP2023

  40. arXiv:2305.06349  [pdf, other

    cs.CL cs.AI cs.LG

    RECKONING: Reasoning through Dynamic Knowledge Encoding

    Authors: Zeming Chen, Gail Weiss, Eric Mitchell, Asli Celikyilmaz, Antoine Bosselut

    Abstract: Recent studies on transformer-based language models show that they can answer questions by reasoning over knowledge provided as part of the context (i.e., in-context reasoning). However, since the available knowledge is often not filtered for a particular question, in-context reasoning can be sensitive to distractor facts, additional content that is irrelevant to a question but that may be relevan… ▽ More

    Submitted 5 November, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: 22 pages, 8 figures, 10 tables, Accepted to NeurIPS 2023

  41. arXiv:2305.02364  [pdf, other

    cs.CL

    PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives

    Authors: Silin Gao, Beatriz Borges, Soyoung Oh, Deniz Bayazit, Saya Kanno, Hiromi Wakaki, Yuki Mitsufuji, Antoine Bosselut

    Abstract: Sustaining coherent and engaging narratives requires dialogue or storytelling agents to understand how the personas of speakers or listeners ground the narrative. Specifically, these agents must infer personas of their listeners to produce statements that cater to their interests. They must also learn to maintain consistent speaker personas for themselves throughout the narrative, so that their co… ▽ More

    Submitted 26 May, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: ACL 2023, long paper

  42. arXiv:2304.01904  [pdf, other

    cs.CL

    REFINER: Reasoning Feedback on Intermediate Representations

    Authors: Debjit Paul, Mete Ismayilzada, Maxime Peyrard, Beatriz Borges, Antoine Bosselut, Robert West, Boi Faltings

    Abstract: Language models (LMs) have recently shown remarkable performance on reasoning tasks by explicitly generating intermediate inferences, e.g., chain-of-thought prompting. However, these intermediate inference steps may be inappropriate deductions from the initial context and lead to incorrect final predictions. Here we introduce REFINER, a framework for finetuning LMs to explicitly generate intermedi… ▽ More

    Submitted 4 February, 2024; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: Accepted at EACL 2024

  43. arXiv:2212.10534  [pdf, other

    cs.CL

    DISCO: Distilling Counterfactuals with Large Language Models

    Authors: Zeming Chen, Qiyue Gao, Antoine Bosselut, Ashish Sabharwal, Kyle Richardson

    Abstract: Models trained with counterfactually augmented data learn representations of the causal structure of tasks, enabling robust generalization. However, high-quality counterfactual data is scarce for most tasks and not easily generated at scale. When crowdsourced, such data is typically limited in scale and diversity; when generated using supervised methods, it is computationally expensive to extend t… ▽ More

    Submitted 5 June, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: ACL 2023 camera ready, final title change

  44. arXiv:2211.08451  [pdf, other

    cs.CL

    kogito: A Commonsense Knowledge Inference Toolkit

    Authors: Mete Ismayilzada, Antoine Bosselut

    Abstract: In this paper, we present kogito, an open-source tool for generating commonsense inferences about situations described in text. kogito provides an intuitive and extensible interface to interact with natural language generation models that can be used for hypothesizing commonsense knowledge inference from a textual input. In particular, kogito offers several features for targeted, multi-granularity… ▽ More

    Submitted 8 March, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: EACL 2023 Camera ready, 9 pages

  45. arXiv:2210.12678  [pdf, other

    cs.CL

    ComFact: A Benchmark for Linking Contextual Commonsense Knowledge

    Authors: Silin Gao, Jena D. Hwang, Saya Kanno, Hiromi Wakaki, Yuki Mitsufuji, Antoine Bosselut

    Abstract: Understanding rich narratives, such as dialogues and stories, often requires natural language processing systems to access relevant knowledge from commonsense knowledge graphs. However, these systems typically retrieve facts from KGs using simple heuristics that disregard the complex challenges of identifying situationally-relevant commonsense knowledge (e.g., contextualization, implicitness, ambi… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP 2022, long paper

  46. arXiv:2210.09338  [pdf, other

    cs.CL cs.AI cs.LG

    Deep Bidirectional Language-Knowledge Graph Pretraining

    Authors: Michihiro Yasunaga, Antoine Bosselut, Hongyu Ren, Xikun Zhang, Christopher D Manning, Percy Liang, Jure Leskovec

    Abstract: Pretraining a language model (LM) on text has been shown to help various downstream NLP tasks. Recent works show that a knowledge graph (KG) can complement text data, offering structured background knowledge that provides a useful scaffold for reasoning. However, these works are not pretrained to learn a deep fusion of the two modalities at scale, limiting the potential to acquire fully joint repr… ▽ More

    Submitted 18 October, 2022; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: Published at NeurIPS 2022. Code, data, and trained models are available at https://github.com/michiyasunaga/dragon

  47. arXiv:2206.06520  [pdf, other

    cs.AI cs.CL

    Memory-Based Model Editing at Scale

    Authors: Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D. Manning, Chelsea Finn

    Abstract: Even the largest neural networks make errors, and once-correct predictions can become invalid as the world changes. Model editors make local updates to the behavior of base (pre-trained) models to inject updated knowledge or correct undesirable behaviors. Existing model editors have shown promise, but also suffer from insufficient expressiveness: they struggle to accurately model an edit's intende… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: ICML 2022. Project site at https://sites.google.com/view/serac-editing

  48. arXiv:2205.12672  [pdf, other

    cs.CL

    Discovering Language-neutral Sub-networks in Multilingual Language Models

    Authors: Negar Foroutan, Mohammadreza Banaei, Remi Lebret, Antoine Bosselut, Karl Aberer

    Abstract: Multilingual pre-trained language models transfer remarkably well on cross-lingual downstream tasks. However, the extent to which they learn language-neutral representations (i.e., shared representations that encode similar phenomena across languages), and the effect of such representations on cross-lingual transfer performance, remain open questions. In this work, we conceptualize language neutra… ▽ More

    Submitted 30 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

  49. arXiv:2205.12485  [pdf, other

    cs.CL cs.AI

    Conditional set generation using Seq2seq models

    Authors: Aman Madaan, Dheeraj Rajagopal, Niket Tandon, Yiming Yang, Antoine Bosselut

    Abstract: Conditional set generation learns a mapping from an input sequence of tokens to a set. Several NLP tasks, such as entity typing and dialogue emotion tagging, are instances of set generation. Seq2Seq models, a popular choice for set generation, treat a set as a sequence and do not fully leverage its key properties, namely order-invariance and cardinality. We propose a novel algorithm for effectivel… ▽ More

    Submitted 24 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: EMNLP 2022

  50. arXiv:2202.09381  [pdf, other

    cs.CL

    Synthetic Disinformation Attacks on Automated Fact Verification Systems

    Authors: Yibing Du, Antoine Bosselut, Christopher D. Manning

    Abstract: Automated fact-checking is a needed technology to curtail the spread of online misinformation. One current framework for such solutions proposes to verify claims by retrieving supporting or refuting evidence from related textual sources. However, the realistic use cases for fact-checkers will require verifying claims against evidence sources that could be affected by the same misinformation. Furth… ▽ More

    Submitted 18 February, 2022; originally announced February 2022.

    Comments: AAAI 2022