Skip to main content

Showing 1–11 of 11 results for author: Garneau, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.02499  [pdf, other

    cs.CL

    Defining Knowledge: Bridging Epistemology and Large Language Models

    Authors: Constanza Fierro, Ruchira Dhar, Filippos Stamatiou, Nicolas Garneau, Anders Søgaard

    Abstract: Knowledge claims are abundant in the literature on large language models (LLMs); but can we say that GPT-4 truly "knows" the Earth is round? To address this question, we review standard definitions of knowledge in epistemology and we formalize interpretations applicable to LLMs. In doing so, we identify inconsistencies and gaps in how current NLP research conceptualizes knowledge with respect to e… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

    Comments: EMNLP 2024

  2. arXiv:2408.11940  [pdf, other

    cs.CL

    The State of Commercial Automatic French Legal Speech Recognition Systems and their Impact on Court Reporters et al

    Authors: Nicolad Garneau, Olivier Bolduc

    Abstract: In Quebec and Canadian courts, the transcription of court proceedings is a critical task for appeal purposes and must be certified by an official court reporter. The limited availability of qualified reporters and the high costs associated with manual transcription underscore the need for more efficient solutions. This paper examines the potential of Automatic Speech Recognition (ASR) systems to a… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  3. arXiv:2404.06833  [pdf, other

    cs.CL

    Does Mapo Tofu Contain Coffee? Probing LLMs for Food-related Cultural Knowledge

    Authors: Li Zhou, Taelin Karidi, Wanlong Liu, Nicolas Garneau, Yong Cao, Wenyu Chen, Haizhou Li, Daniel Hershcovich

    Abstract: Recent studies have highlighted the presence of cultural biases in Large Language Models (LLMs), yet often lack a robust methodology to dissect these phenomena comprehensively. Our work aims to bridge this gap by delving into the Food domain, a universally relevant yet culturally diverse aspect of human life. We introduce FmLAMA, a multilingual dataset centered on food-related cultural facts and v… ▽ More

    Submitted 6 February, 2025; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: cultural bias analysis, cultural knowledge probing, large language models, cultural NLP; Accepted by NAACL2025

    Journal ref: NAACL2025

  4. arXiv:2404.03036  [pdf, other

    cs.CL

    MuLan: A Study of Fact Mutability in Language Models

    Authors: Constanza Fierro, Nicolas Garneau, Emanuele Bugliarello, Yova Kementchedjhieva, Anders Søgaard

    Abstract: Facts are subject to contingencies and can be true or false in different circumstances. One such contingency is time, wherein some facts mutate over a given period, e.g., the president of a country or the winner of a championship. Trustworthy language models ideally identify mutable facts as such and process them accordingly. We create MuLan, a benchmark for evaluating the ability of English langu… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  5. arXiv:2403.00876  [pdf, other

    cs.CL cs.AI

    Word Order and World Knowledge

    Authors: Qinghua Zhao, Vinit Ravishankar, Nicolas Garneau, Anders Søgaard

    Abstract: Word order is an important concept in natural language, and in this work, we study how word order affects the induction of world knowledge from raw text using language models. We use word analogies to probe for such knowledge. Specifically, in addition to the natural word order, we first respectively extract texts of six fixed word orders from five languages and then pretrain the language models o… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  6. arXiv:2305.07507  [pdf, other

    cs.CL

    LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development

    Authors: Ilias Chalkidis, Nicolas Garneau, Catalina Goanta, Daniel Martin Katz, Anders Søgaard

    Abstract: In this work, we conduct a detailed analysis on the performance of legal-oriented pre-trained language models (PLMs). We examine the interplay between their original objective, acquired knowledge, and legal language understanding capacities which we define as the upstream, probing, and downstream performance, respectively. We consider not only the models' size but also the pre-training corpora use… ▽ More

    Submitted 22 May, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

    Comments: 9 pages, long paper at ACL 2023 proceedings

  7. arXiv:2208.04223  [pdf, other

    cs.IR cs.LG

    Beer2Vec : Extracting Flavors from Reviews for Thirst-Quenching Recommandations

    Authors: Jean-Thomas Baillargeon, Nicolas Garneau

    Abstract: This paper introduces the Beer2Vec model that allows the most popular alcoholic beverage in the world to be encoded into vectors enabling flavorful recommendations. We present our algorithm using a unique dataset focused on the analysis of craft beers. We thoroughly explain how we encode the flavors and how useful, from an empirical point of view, the beer vectors are to generate meaningful recomm… ▽ More

    Submitted 4 August, 2022; originally announced August 2022.

  8. arXiv:2011.12183  [pdf, other

    cs.CL

    Generating Intelligible Plumitifs Descriptions: Use Case Application with Ethical Considerations

    Authors: David Beauchemin, Nicolas Garneau, Eve Gaumond, Pierre-Luc Déziel, Richard Khoury, Luc Lamontagne

    Abstract: Plumitifs (dockets) were initially a tool for law clerks. Nowadays, they are used as summaries presenting all the steps of a judicial case. Information concerning parties' identity, jurisdiction in charge of administering the case, and some information relating to the nature and the course of the preceding are available through plumitifs. They are publicly accessible but barely understandable; the… ▽ More

    Submitted 24 November, 2020; originally announced November 2020.

    Comments: INLG 2020

  9. arXiv:1912.06876  [pdf, other

    cs.LG stat.ML

    Attending Form and Context to Generate Specialized Out-of-VocabularyWords Representations

    Authors: Nicolas Garneau, Jean-Samuel Leboeuf, Yuval Pinter, Luc Lamontagne

    Abstract: We propose a new contextual-compositional neural network layer that handles out-of-vocabulary (OOV) words in natural language processing (NLP) tagging tasks. This layer consists of a model that attends to both the character sequence and the context in which the OOV words appear. We show that our model learns to generate task-specific \textit{and} sentence-dependent OOV word representations without… ▽ More

    Submitted 14 December, 2019; originally announced December 2019.

  10. arXiv:1912.01706  [pdf, ps, other

    cs.LG cs.CL stat.ML

    A Robust Self-Learning Method for Fully Unsupervised Cross-Lingual Mappings of Word Embeddings: Making the Method Robustly Reproducible as Well

    Authors: Nicolas Garneau, Mathieu Godbout, David Beauchemin, Audrey Durand, Luc Lamontagne

    Abstract: In this paper, we reproduce the experiments of Artetxe et al. (2018b) regarding the robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. We show that the reproduction of their method is indeed feasible with some minor assumptions. We further investigate the robustness of their model by introducing four new languages that are less similar to English than the… ▽ More

    Submitted 3 March, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

    Comments: Accept in REPROLANG@LREC2020

  11. arXiv:1903.00724  [pdf, ps, other

    cs.CL cs.LG

    Predicting and interpreting embeddings for out of vocabulary words in downstream tasks

    Authors: Nicolas Garneau, Jean-Samuel Leboeuf, Luc Lamontagne

    Abstract: We propose a novel way to handle out of vocabulary (OOV) words in downstream natural language processing (NLP) tasks. We implement a network that predicts useful embeddings for OOV words based on their morphology and on the context in which they appear. Our model also incorporates an attention mechanism indicating the focus allocated to the left context words, the right context words or the word's… ▽ More

    Submitted 2 March, 2019; originally announced March 2019.

    Comments: 2 pages, 0 figures, 2 tables

    Journal ref: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP