Skip to main content

Showing 1–5 of 5 results for author: Alimova, I

.
  1. arXiv:2502.11926  [pdf, ps, other

    cs.CL

    BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages

    Authors: Shamsuddeen Hassan Muhammad, Nedjma Ousidhoum, Idris Abdulmumin, Jan Philip Wahle, Terry Ruas, Meriem Beloucif, Christine de Kock, Nirmal Surange, Daniela Teodorescu, Ibrahim Said Ahmad, David Ifeoluwa Adelani, Alham Fikri Aji, Felermino D. M. A. Ali, Ilseyar Alimova, Vladimir Araujo, Nikolay Babakov, Naomi Baes, Ana-Maria Bucur, Andiswa Bukula, Guanqun Cao, Rodrigo Tufino Cardenas, Rendi Chevi, Chiamaka Ijeoma Chukwuneke, Alexandra Ciobotaru, Daryna Dementieva , et al. (23 additional authors not shown)

    Abstract: People worldwide use language in subtle and complex ways to express emotions. Although emotion recognition--an umbrella term for several NLP tasks--impacts various applications within NLP and beyond, most work in this area has focused on high-resource languages. This has led to significant disparities in research efforts and proposed solutions, particularly for under-resourced languages, which oft… ▽ More

    Submitted 29 May, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: Accepted at ACL2025 (Main)

  2. arXiv:2403.17553  [pdf, other

    cs.CL

    RuBia: A Russian Language Bias Detection Dataset

    Authors: Veronika Grigoreva, Anastasiia Ivanova, Ilseyar Alimova, Ekaterina Artemova

    Abstract: Warning: this work contains upsetting or disturbing content. Large language models (LLMs) tend to learn the social and cultural biases present in the raw pre-training data. To test if an LLM's behavior is fair, functional datasets are employed, and due to their purpose, these datasets are highly language and culture-specific. In this paper, we address a gap in the scope of multilingual bias eval… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: accepted to LREC-COLING 2024

  3. arXiv:2311.08143  [pdf, other

    cs.CL

    Sinkhorn Transformations for Single-Query Postprocessing in Text-Video Retrieval

    Authors: Konstantin Yakovlev, Gregory Polyakov, Ilseyar Alimova, Alexander Podolskiy, Andrey Bout, Sergey Nikolenko, Irina Piontkovskaya

    Abstract: A recent trend in multimodal retrieval is related to postprocessing test set results via the dual-softmax loss (DSL). While this approach can bring significant improvements, it usually presumes that an entire matrix of test samples is available as DSL input. This work introduces a new postprocessing approach based on Sinkhorn transformations that outperforms DSL. Further, we propose a new postproc… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: SIGIR 2023

  4. Selection of pseudo-annotated data for adverse drug reaction classification across drug groups

    Authors: Ilseyar Alimova, Elena Tutubalina

    Abstract: Automatic monitoring of adverse drug events (ADEs) or reactions (ADRs) is currently receiving significant attention from the biomedical community. In recent years, user-generated data on social media has become a valuable resource for this task. Neural models have achieved impressive performance on automatic text classification for ADR detection. Yet, training and evaluation of these methods are c… ▽ More

    Submitted 24 November, 2021; originally announced November 2021.

    Comments: Accepted to AIST 2021

    Journal ref: Analysis of Images, Social Networks and Texts. AIST 2021. Lecture Notes in Computer Science, vol 13217. Springer, Cham

  5. The Russian Drug Reaction Corpus and Neural Models for Drug Reactions and Effectiveness Detection in User Reviews

    Authors: Elena Tutubalina, Ilseyar Alimova, Zulfat Miftahutdinov, Andrey Sakhovskiy, Valentin Malykh, Sergey Nikolenko

    Abstract: The Russian Drug Reaction Corpus (RuDReC) is a new partially annotated corpus of consumer reviews in Russian about pharmaceutical products for the detection of health-related named entities and the effectiveness of pharmaceutical products. The corpus itself consists of two parts, the raw one and the labelled one. The raw part includes 1.4 million health-related user-generated texts collected from… ▽ More

    Submitted 7 April, 2020; originally announced April 2020.

    Comments: 9 pages, 9 tables, 4 figures

    Journal ref: Bioinformatics, 2020