Skip to main content

Showing 1–3 of 3 results for author: Kostkan, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.13595  [pdf, ps, other

    cs.CL cs.AI cs.IR

    MMTEB: Massive Multilingual Text Embedding Benchmark

    Authors: Kenneth Enevoldsen, Isaac Chung, Imene Kerboua, Márton Kardos, Ashwin Mathur, David Stap, Jay Gala, Wissam Siblini, Dominik Krzemiński, Genta Indra Winata, Saba Sturua, Saiteja Utpala, Mathieu Ciancone, Marion Schaeffer, Gabriel Sequeira, Diganta Misra, Shreeya Dhakal, Jonathan Rystrøm, Roman Solomatin, Ömer Çağatan, Akash Kundu, Martin Bernstorff, Shitao Xiao, Akshita Sukhlecha, Bhavish Pahwa , et al. (61 additional authors not shown)

    Abstract: Text embeddings are typically evaluated on a limited set of tasks, which are constrained by language, domain, and task diversity. To address these limitations and provide a more comprehensive evaluation, we introduce the Massive Multilingual Text Embedding Benchmark (MMTEB) - a large-scale, community-driven expansion of MTEB, covering over 500 quality-controlled evaluation tasks across 250+ langua… ▽ More

    Submitted 8 June, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

    Comments: Accepted for ICLR: https://openreview.net/forum?id=zl3pfz4VCV

  2. arXiv:2406.09556  [pdf, other

    cs.LG cs.CL stat.ML

    $S^3$ -- Semantic Signal Separation

    Authors: Márton Kardos, Jan Kostkan, Arnault-Quentin Vermillet, Kristoffer Nielbo, Kenneth Enevoldsen, Roberta Rocca

    Abstract: Topic models are useful tools for discovering latent semantic structures in large textual corpora. Recent efforts have been oriented at incorporating contextual representations in topic modeling and have been shown to outperform classical topic models. These approaches are typically slow, volatile, and require heavy preprocessing for optimal results. We present Semantic Signal Separation ($S^3$),… ▽ More

    Submitted 19 May, 2025; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 24 pages, 13 figures (main manuscript has 9 pages and 7 figures); The paper has been adjusted according to reviewers' feedback

    ACM Class: I.2.7

  3. arXiv:2109.08589  [pdf, other

    cs.CY cs.IR cs.IT

    Event Flow -- How Events Shaped the Flow of the News, 1950-1995

    Authors: Melvin Wevers, Jan Kostkan, Kristoffer L. Nielbo

    Abstract: This article relies on information-theoretic measures to examine how events impacted the news for the period 1950-1995. Moreover, we present a method for event characterization in (unstructured) textual sources, offering a taxonomy of events based on the different ways they impacted the flow of news information. The results give us a better understanding of the relationship between events and thei… ▽ More

    Submitted 17 September, 2021; originally announced September 2021.

    Comments: 12 pages, 8 figures, conference: Computational Humanities Research Conference 2021