Skip to main content

Showing 1–15 of 15 results for author: Dubossarsky, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.24834  [pdf, ps, other

    cs.CL

    Multilinguality Does not Make Sense: Investigating Factors Behind Zero-Shot Transfer in Sense-Aware Tasks

    Authors: Roksana Goworek, Haim Dubossarsky

    Abstract: Cross-lingual transfer allows models to perform tasks in languages unseen during training and is often assumed to benefit from increased multilinguality. In this work, we challenge this assumption in the context of two underexplored, sense-aware tasks: polysemy disambiguation and lexical semantic change. Through a large-scale analysis across 28 languages, we show that multilingual training is neit… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: 8 pages, 8 figures

  2. arXiv:2505.23714  [pdf, ps, other

    cs.CL cs.AI

    SenWiCh: Sense-Annotation of Low-Resource Languages for WiC using Hybrid Methods

    Authors: Roksana Goworek, Harpal Karlcut, Muhammad Shezad, Nijaguna Darshana, Abhishek Mane, Syam Bondada, Raghav Sikka, Ulvi Mammadov, Rauf Allahverdiyev, Sriram Purighella, Paridhi Gupta, Muhinyia Ndegwa, Haim Dubossarsky

    Abstract: This paper addresses the critical need for high-quality evaluation datasets in low-resource languages to advance cross-lingual transfer. While cross-lingual transfer offers a key strategy for leveraging multilingual pretraining to expand language technologies to understudied and typologically diverse languages, its effectiveness is dependent on quality and suitable benchmarks. We release new sense… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 8 pages, 22 figures, submitted to SIGTYP 2025 workshop in ACL

  3. arXiv:2505.20435  [pdf, ps, other

    cs.LG cs.AI cs.CG math.AT

    Holes in Latent Space: Topological Signatures Under Adversarial Influence

    Authors: Aideen Fay, Inés García-Redondo, Qiquan Wang, Haim Dubossarsky, Anthea Monod

    Abstract: Understanding how adversarial conditions affect language models requires techniques that capture both global structure and local detail within high-dimensional activation spaces. We propose persistent homology (PH), a tool from topological data analysis, to systematically characterize multiscale latent space dynamics in LLMs under two distinct attack modes -- backdoor fine-tuning and indirect prom… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  4. arXiv:2503.08042  [pdf, ps, other

    cs.CL

    LSC-Eval: A General Framework to Evaluate Methods for Assessing Dimensions of Lexical Semantic Change Using LLM-Generated Synthetic Data

    Authors: Naomi Baes, Raphaël Merx, Nick Haslam, Ekaterina Vylomova, Haim Dubossarsky

    Abstract: Lexical Semantic Change (LSC) provides insight into cultural and social dynamics. Yet, the validity of methods for measuring different kinds of LSC remains unestablished due to the absence of historical benchmark datasets. To address this gap, we propose LSC-Eval, a novel three-stage general-purpose evaluation framework to: (1) develop a scalable methodology for generating synthetic datasets that… ▽ More

    Submitted 4 June, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

    Comments: Accepted to ACL Findings (9-page long paper; 35 pages total including limitations, appendices and references)

  5. arXiv:2404.18570  [pdf, other

    cs.CL

    Analyzing Semantic Change through Lexical Replacements

    Authors: Francesco Periti, Pierluigi Cassotti, Haim Dubossarsky, Nina Tahmasebi

    Abstract: Modern language models are capable of contextualizing words based on their surrounding context. However, this capability is often compromised due to semantic change that leads to words being used in new, unexpected contexts not encountered during pre-training. In this paper, we model \textit{semantic change} by studying the effect of unexpected contexts introduced by \textit{lexical replacements}.… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  6. arXiv:2401.14040  [pdf, other

    cs.CL

    (Chat)GPT v BERT: Dawn of Justice for Semantic Change Detection

    Authors: Francesco Periti, Haim Dubossarsky, Nina Tahmasebi

    Abstract: In the universe of Natural Language Processing, Transformer-based language models like BERT and (Chat)GPT have emerged as lexical superheroes with great power to solve open research problems. In this paper, we specifically focus on the temporal problem of semantic change, and evaluate their ability to solve two diachronic extensions of the Word-in-Context (WiC) task: TempoWiC and HistoWiC. In part… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: Accepted to the Findings of EACL 2024 (https://aclanthology.org/2024.findings-eacl.29.pdf)

  7. arXiv:2305.13214  [pdf, other

    cs.CL

    Atomic Inference for NLI with Generated Facts as Atoms

    Authors: Joe Stacey, Pasquale Minervini, Haim Dubossarsky, Oana-Maria Camburu, Marek Rei

    Abstract: With recent advances, neural models can achieve human-level performance on various natural language tasks. However, there are no guarantees that any explanations from these models are faithful, i.e. that they reflect the inner workings of the model. Atomic inference overcomes this issue, providing interpretable and faithful model decisions. This approach involves making predictions for different c… ▽ More

    Submitted 1 October, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted at EMNLP 2024

    ACM Class: I.2.7

  8. arXiv:2304.06337  [pdf, ps, other

    cs.CL

    Computational modeling of semantic change

    Authors: Nina Tahmasebi, Haim Dubossarsky

    Abstract: In this chapter we provide an overview of computational modeling for semantic change using large and semi-large textual corpora. We aim to provide a key for the interpretation of relevant methods and evaluation techniques, and also provide insights into important aspects of the computational study of semantic change. We discuss the pros and cons of different classes of models with respect to the p… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

    Comments: This chapter is submitted to Routledge Handbook of Historical Linguistics, 2nd Edition

  9. arXiv:2205.11432  [pdf, other

    cs.CL cs.LG

    Logical Reasoning with Span-Level Predictions for Interpretable and Robust NLI Models

    Authors: Joe Stacey, Pasquale Minervini, Haim Dubossarsky, Marek Rei

    Abstract: Current Natural Language Inference (NLI) models achieve impressive results, sometimes outperforming humans when evaluating on in-distribution test sets. However, as these models are known to learn from annotation artefacts and dataset biases, it is unclear to what extent the models are learning the task of NLI instead of learning from shallow heuristics in their training data. We address this issu… ▽ More

    Submitted 21 October, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: Accepted at EMNLP 2022

  10. arXiv:2104.08540  [pdf, other

    cs.CL

    DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages

    Authors: Dominik Schlechtweg, Nina Tahmasebi, Simon Hengchen, Haim Dubossarsky, Barbara McGillivray

    Abstract: Word meaning is notoriously difficult to capture, both synchronically and diachronically. In this paper, we describe the creation of the largest resource of graded contextualized, diachronic word meaning annotation in four different languages, based on 100,000 human semantic proximity judgments. We thoroughly describe the multi-round incremental annotation process, the choice for a clustering algo… ▽ More

    Submitted 8 July, 2024; v1 submitted 17 April, 2021; originally announced April 2021.

    Comments: Dominik Schlechtweg, Nina Tahmasebi, Simon Hengchen, Haim Dubossarsky, and Barbara McGillivray. 2021. DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7079--7091, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics

  11. arXiv:2101.07668  [pdf, other

    cs.CL

    Challenges for Computational Lexical Semantic Change

    Authors: Simon Hengchen, Nina Tahmasebi, Dominik Schlechtweg, Haim Dubossarsky

    Abstract: The computational study of lexical semantic change (LSC) has taken off in the past few years and we are seeing increasing interest in the field, from both computational sciences and linguistics. Most of the research so far has focused on methods for modelling and detecting semantic change using large diachronic textual data, with the majority of the approaches employing neural embeddings. While me… ▽ More

    Submitted 19 January, 2021; originally announced January 2021.

    Comments: To appear in: Nina Tahmasebi, Lars Borin, Adam Jatowt, Yang Xu, Simon Hengchen (eds). Computational Approaches to Semantic Change. Berlin: Language Science Press. [preliminary page numbering]

  12. arXiv:2007.11464  [pdf, other

    cs.CL

    SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection

    Authors: Dominik Schlechtweg, Barbara McGillivray, Simon Hengchen, Haim Dubossarsky, Nina Tahmasebi

    Abstract: Lexical Semantic Change detection, i.e., the task of identifying words that change meaning over time, is a very active research area, with applications in NLP, lexicography, and linguistics. Evaluation is currently the most pressing problem in Lexical Semantic Change detection, as no gold standards are available to the community, which hinders progress. We present the results of the first shared t… ▽ More

    Submitted 28 August, 2020; v1 submitted 22 July, 2020; originally announced July 2020.

    Comments: SemEval@COLING2020, 12 pages

  13. arXiv:2004.07790  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Avoiding the Hypothesis-Only Bias in Natural Language Inference via Ensemble Adversarial Training

    Authors: Joe Stacey, Pasquale Minervini, Haim Dubossarsky, Sebastian Riedel, Tim Rocktäschel

    Abstract: Natural Language Inference (NLI) datasets contain annotation artefacts resulting in spurious correlations between the natural language utterances and their respective entailment classes. These artefacts are exploited by neural networks even when only considering the hypothesis and ignoring the premise, leading to unwanted biases. Belinkov et al. (2019b) proposed tackling this problem via adversari… ▽ More

    Submitted 27 May, 2021; v1 submitted 16 April, 2020; originally announced April 2020.

    Comments: Accepted at EMNLP 2020

  14. arXiv:2001.11136  [pdf, other

    cs.CL

    The Secret is in the Spectra: Predicting Cross-lingual Task Performance with Spectral Similarity Measures

    Authors: Haim Dubossarsky, Ivan Vulić, Roi Reichart, Anna Korhonen

    Abstract: Performance in cross-lingual NLP tasks is impacted by the (dis)similarity of languages at hand: e.g., previous work has suggested there is a connection between the expected success of bilingual lexicon induction (BLI) and the assumption of (approximate) isomorphism between monolingual embedding spaces. In this work we present a large-scale study focused on the correlations between monolingual embe… ▽ More

    Submitted 12 October, 2020; v1 submitted 29 January, 2020; originally announced January 2020.

    Comments: EMNLP 2020: Long paper

  15. Time-Out: Temporal Referencing for Robust Modeling of Lexical Semantic Change

    Authors: Haim Dubossarsky, Simon Hengchen, Nina Tahmasebi, Dominik Schlechtweg

    Abstract: State-of-the-art models of lexical semantic change detection suffer from noise stemming from vector space alignment. We have empirically tested the Temporal Referencing method for lexical semantic change and show that, by avoiding alignment, it is less affected by this noise. We show that, trained on a diachronic corpus, the skip-gram with negative sampling architecture with temporal referencing o… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.

    Comments: To appear in the 57th Annual Meeting of the Association for Computational Linguistics (ACL2019)