Skip to main content

Showing 1–6 of 6 results for author: Lasri, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.19260  [pdf, other

    cs.CL

    NaijaHate: Evaluating Hate Speech Detection on Nigerian Twitter Using Representative Data

    Authors: Manuel Tonneau, Pedro Vitor Quinta de Castro, Karim Lasri, Ibrahim Farouq, Lakshminarayanan Subramanian, Victor Orozco-Olvera, Samuel P. Fraiberger

    Abstract: To address the global issue of online hate, hate speech detection (HSD) systems are typically developed on datasets from the United States, thereby failing to generalize to English dialects from the Majority World. Furthermore, HSD models are often evaluated on non-representative samples, raising concerns about overestimating model performance in real-world settings. In this work, we introduce Nai… ▽ More

    Submitted 24 June, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: ACL 2024 main conference. Data and models available at https://github.com/worldbank/NaijaHate

  2. arXiv:2211.04427  [pdf, other

    cs.CL cs.LG cs.NE

    Word Order Matters when you Increase Masking

    Authors: Karim Lasri, Alessandro Lenci, Thierry Poibeau

    Abstract: Word order, an essential property of natural languages, is injected in Transformer-based neural language models using position encoding. However, recent experiments have shown that explicit position encoding is not always useful, since some models without such feature managed to achieve state-of-the art performance on some tasks. To understand better this phenomenon, we examine the effect of remov… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: Accepted at EMNLP 2022 (main conference)

  3. State-of-the-art generalisation research in NLP: A taxonomy and review

    Authors: Dieuwke Hupkes, Mario Giulianelli, Verna Dankers, Mikel Artetxe, Yanai Elazar, Tiago Pimentel, Christos Christodoulopoulos, Karim Lasri, Naomi Saphra, Arabella Sinclair, Dennis Ulmer, Florian Schottmann, Khuyagbaatar Batsuren, Kaiser Sun, Koustuv Sinha, Leila Khalatbari, Maria Ryskina, Rita Frieske, Ryan Cotterell, Zhijing Jin

    Abstract: The ability to generalise well is one of the primary desiderata of natural language processing (NLP). Yet, what 'good generalisation' entails and how it should be evaluated is not well understood, nor are there any evaluation standards for generalisation. In this paper, we lay the groundwork to address both of these issues. We present a taxonomy for characterising and understanding generalisation… ▽ More

    Submitted 12 January, 2024; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: This preprint was published as an Analysis article in Nature Machine Intelligence. Please refer to the published version when citing this work. 28 pages of content + 6 pages of appendix + 52 pages of references

    Journal ref: Nat Mach Intell 5, 1161-1174 (2023)

  4. arXiv:2209.10538  [pdf, other

    cs.CL

    Subject Verb Agreement Error Patterns in Meaningless Sentences: Humans vs. BERT

    Authors: Karim Lasri, Olga Seminck, Alessandro Lenci, Thierry Poibeau

    Abstract: Both humans and neural language models are able to perform subject-verb number agreement (SVA). In principle, semantics shouldn't interfere with this task, which only requires syntactic knowledge. In this work we test whether meaning interferes with this type of agreement in English in syntactic structures of various complexities. To do so, we generate both semantically well-formed and nonsensical… ▽ More

    Submitted 21 September, 2022; originally announced September 2022.

    Comments: COLING 2022 Main Conference (The 29th international conference on computational linguistics)

  5. Probing for the Usage of Grammatical Number

    Authors: Karim Lasri, Tiago Pimentel, Alessandro Lenci, Thierry Poibeau, Ryan Cotterell

    Abstract: A central quest of probing is to uncover how pre-trained models encode a linguistic property within their representations. An encoding, however, might be spurious-i.e., the model might not rely on it when making predictions. In this paper, we try to find encodings that the model actually uses, introducing a usage-based probing setup. We first choose a behavioral task which cannot be solved without… ▽ More

    Submitted 22 May, 2024; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: ACL 2022 (Main Conference) The discussion section had been inadvertently removed before the article was published on arxiv

  6. Does BERT really agree ? Fine-grained Analysis of Lexical Dependence on a Syntactic Task

    Authors: Karim Lasri, Alessandro Lenci, Thierry Poibeau

    Abstract: Although transformer-based Neural Language Models demonstrate impressive performance on a variety of tasks, their generalization abilities are not well understood. They have been shown to perform strongly on subject-verb number agreement in a wide array of settings, suggesting that they learned to track syntactic dependencies during their training even without explicit supervision. In this paper,… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

    Comments: ACL 2022 (Findings)