Skip to main content

Showing 1–26 of 26 results for author: Kolesnikova, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.06929  [pdf

    cs.CL

    Hybrid Extractive Abstractive Summarization for Multilingual Sentiment Analysis

    Authors: Mikhail Krasitskii, Grigori Sidorov, Olga Kolesnikova, Liliana Chanona Hernandez, Alexander Gelbukh

    Abstract: We propose a hybrid approach for multilingual sentiment analysis that combines extractive and abstractive summarization to address the limitations of standalone methods. The model integrates TF-IDF-based extraction with a fine-tuned XLM-R abstractive module, enhanced by dynamic thresholding and cultural adaptation. Experiments across 10 languages show significant improvements over baselines, achie… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

    Comments: 6 pages

  2. Explainable AI: XAI-Guided Context-Aware Data Augmentation

    Authors: Melkamu Abay Mersha, Mesay Gemeda Yigezu, Atnafu Lambebo Tonja, Hassan Shakil, Samer Iskander, Olga Kolesnikova, Jugal Kalita

    Abstract: Explainable AI (XAI) has emerged as a powerful tool for improving the performance of AI models, going beyond providing model transparency and interpretability. The scarcity of labeled data remains a fundamental challenge in developing robust and generalizable AI models, particularly for low-resource languages. Conventional data augmentation techniques introduce noise, cause semantic drift, disrupt… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  3. arXiv:2504.02863  [pdf

    cs.CL cs.SI

    GS_DravidianLangTech@2025: Women Targeted Abusive Texts Detection on Social Media

    Authors: Girma Yohannis Bade, Zahra Ahani, Olga Kolesnikova, José Luis Oropeza, Grigori Sidorov

    Abstract: The increasing misuse of social media has become a concern; however, technological solutions are being developed to moderate its content effectively. This paper focuses on detecting abusive texts targeting women on social media platforms. Abusive speech refers to communication intended to harm or incite hatred against vulnerable individuals or groups. Specifically, this study aims to identify abus… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

  4. arXiv:2504.00265  [pdf

    cs.CL

    Multilingual Sentiment Analysis of Summarized Texts: A Cross-Language Study of Text Shortening Effects

    Authors: Mikhail Krasitskii, Grigori Sidorov, Olga Kolesnikova, Liliana Chanona Hernandez, Alexander Gelbukh

    Abstract: Summarization significantly impacts sentiment analysis across languages with diverse morphologies. This study examines extractive and abstractive summarization effects on sentiment classification in English, German, French, Spanish, Italian, Finnish, Hungarian, and Arabic. We assess sentiment shifts post-summarization using multilingual transformers (mBERT, XLM-RoBERTa, T5, and BART) and language-… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

  5. arXiv:2503.23295  [pdf

    cs.CL

    Advancing Sentiment Analysis in Tamil-English Code-Mixed Texts: Challenges and Transformer-Based Solutions

    Authors: Mikhail Krasitskii, Olga Kolesnikova, Liliana Chanona Hernandez, Grigori Sidorov, Alexander Gelbukh

    Abstract: The sentiment analysis task in Tamil-English code-mixed texts has been explored using advanced transformer-based models. Challenges from grammatical inconsistencies, orthographic variations, and phonetic ambiguities have been addressed. The limitations of existing datasets and annotation gaps have been examined, emphasizing the need for larger and more diverse corpora. Transformer architectures, i… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

  6. arXiv:2503.18253  [pdf, other

    cs.CL

    Enhancing Multi-Label Emotion Analysis and Corresponding Intensities for Ethiopian Languages

    Authors: Tadesse Destaw Belay, Dawit Ketema Gete, Abinew Ali Ayele, Olga Kolesnikova, Grigori Sidorov, Seid Muhie Yimam

    Abstract: In this digital world, people freely express their emotions using different social media platforms. As a result, modeling and integrating emotion-understanding models are vital for various human-computer interaction tasks such as decision-making, product and customer feedback analysis, political promotions, marketing research, and social media monitoring. As users express different emotions simult… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

  7. arXiv:2503.10688  [pdf, other

    cs.CL

    CULEMO: Cultural Lenses on Emotion -- Benchmarking LLMs for Cross-Cultural Emotion Understanding

    Authors: Tadesse Destaw Belay, Ahmed Haj Ahmed, Alvin Grissom II, Iqra Ameer, Grigori Sidorov, Olga Kolesnikova, Seid Muhie Yimam

    Abstract: NLP research has increasingly focused on subjective tasks such as emotion analysis. However, existing emotion benchmarks suffer from two major shortcomings: (1) they largely rely on keyword-based emotion recognition, overlooking crucial cultural dimensions required for deeper emotion understanding, and (2) many are created by translating English-annotated data into other languages, leading to pote… ▽ More

    Submitted 27 May, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

    Comments: ACL-main 2025

  8. arXiv:2502.09640  [pdf

    cs.CL cs.AI

    Online Social Support Detection in Spanish Social Media Texts

    Authors: Moein Shahiki Tash, Luis Ramos, Zahra Ahani, Raul Monroy, Olga kolesnikova, Hiram Calvo, Grigori Sidorov

    Abstract: The advent of social media has transformed communication, enabling individuals to share their experiences, seek support, and participate in diverse discussions. While extensive research has focused on identifying harmful content like hate speech, the recognition and promotion of positive and supportive interactions remain largely unexplored. This study proposes an innovative approach to detecting… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

  9. Comparative Approaches to Sentiment Analysis Using Datasets in Major European and Arabic Languages

    Authors: Mikhail Krasitskii, Olga Kolesnikova, Liliana Chanona Hernandez, Grigori Sidorov, Alexander Gelbukh

    Abstract: This study explores transformer-based models such as BERT, mBERT, and XLM-R for multi-lingual sentiment analysis across diverse linguistic structures. Key contributions include the identification of XLM-R superior adaptability in morphologically complex languages, achieving accuracy levels above 88%. The work highlights fine-tuning strategies and emphasizes their significance for improving sentime… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

    Comments: 11th International Conference on Advances in Computer Science and Information Technology (ACSTY 2025)

  10. arXiv:2501.03370  [pdf

    cs.CL cs.AI cs.HC cs.LG

    Advanced Machine Learning Techniques for Social Support Detection on Social Media

    Authors: Olga Kolesnikova, Moein Shahiki Tash, Zahra Ahani, Ameeta Agrawal, Raul Monroy, Grigori Sidorov

    Abstract: The widespread use of social media highlights the need to understand its impact, particularly the role of online social support. This study uses a dataset focused on online social support, which includes binary and multiclass classifications of social support content on social media. The classification of social support is divided into three tasks. The first task focuses on distinguishing between… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

  11. arXiv:2412.17837  [pdf, other

    cs.CL

    Evaluating the Capabilities of Large Language Models for Multi-label Emotion Understanding

    Authors: Tadesse Destaw Belay, Israel Abebe Azime, Abinew Ali Ayele, Grigori Sidorov, Dietrich Klakow, Philipp Slusallek, Olga Kolesnikova, Seid Muhie Yimam

    Abstract: Large Language Models (LLMs) show promising learning and reasoning abilities. Compared to other NLP tasks, multilingual and multi-label emotion evaluation tasks are under-explored in LLMs. In this paper, we present EthioEmo, a multi-label emotion classification dataset for four Ethiopian languages, namely, Amharic (amh), Afan Oromo (orm), Somali (som), and Tigrinya (tir). We perform extensive expe… ▽ More

    Submitted 3 January, 2025; v1 submitted 17 December, 2024; originally announced December 2024.

    Comments: COLING 2025, main conference, long

  12. arXiv:2410.06428  [pdf

    cs.CL cs.AI cs.HC cs.LG

    Stress Detection on Code-Mixed Texts in Dravidian Languages using Machine Learning

    Authors: L. Ramos, M. Shahiki-Tash, Z. Ahani, A. Eponon, O. Kolesnikova, H. Calvo

    Abstract: Stress is a common feeling in daily life, but it can affect mental well-being in some situations, the development of robust detection models is imperative. This study introduces a methodical approach to the stress identification in code-mixed texts for Dravidian languages. The challenge encompassed two datasets, targeting Tamil and Telugu languages respectively. This proposal underscores the impor… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  13. arXiv:2410.02609  [pdf, other

    cs.CL

    Ethio-Fake: Cutting-Edge Approaches to Combat Fake News in Under-Resourced Languages Using Explainable AI

    Authors: Mesay Gemeda Yigezu, Melkamu Abay Mersha, Girma Yohannis Bade, Jugal Kalita, Olga Kolesnikova, Alexander Gelbukh

    Abstract: The proliferation of fake news has emerged as a significant threat to the integrity of information dissemination, particularly on social media platforms. Misinformation can spread quickly due to the ease of creating and disseminating content, affecting public opinion and sociopolitical events. Identifying false information is therefore essential to reducing its negative consequences and maintainin… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

    Journal ref: ACLing 2024: 6th International Conference on AI in Computational Linguistics

  14. arXiv:2409.02836  [pdf

    cs.CL cs.AI cs.CE cs.LG

    Exploring Sentiment Dynamics and Predictive Behaviors in Cryptocurrency Discussions by Few-Shot Learning with Large Language Models

    Authors: Moein Shahiki Tash, Zahra Ahani, Mohim Tash, Olga Kolesnikova, Grigori Sidorov

    Abstract: This study performs analysis of Predictive statements, Hope speech, and Regret Detection behaviors within cryptocurrency-related discussions, leveraging advanced natural language processing techniques. We introduce a novel classification scheme named "Prediction statements," categorizing comments into Predictive Incremental, Predictive Decremental, Predictive Neutral, or Non-Predictive categories.… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  15. arXiv:2405.03084  [pdf

    cs.CL cs.LG

    Analyzing Emotional Trends from X platform using SenticNet: A Comparative Analysis with Cryptocurrency Price

    Authors: Moein Shahiki Tash, Zahra Ahani, Olga Kolesnikova, Grigori Sidorov

    Abstract: This study delves into the relationship between emotional trends from X platform data and the market dynamics of well-known cryptocurrencies Cardano, Binance, Fantom, Matic, and Ripple over the period from October 2022 to March 2023. Leveraging SenticNet, we identified emotions like Fear and Anxiety, Rage and Anger, Grief and Sadness, Delight and Pleasantness, Enthusiasm and Eagerness, and Delight… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  16. arXiv:2404.05365  [pdf, other

    cs.CL

    NLP Progress in Indigenous Latin American Languages

    Authors: Atnafu Lambebo Tonja, Fazlourrahman Balouchzahi, Sabur Butt, Olga Kolesnikova, Hector Ceballos, Alexander Gelbukh, Thamar Solorio

    Abstract: The paper focuses on the marginalization of indigenous language communities in the face of rapid technological advancements. We highlight the cultural richness of these languages and the risk they face of being overlooked in the realm of Natural Language Processing (NLP). We aim to bridge the gap between these communities and researchers, emphasizing the need for inclusive technological advancemen… ▽ More

    Submitted 12 May, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted at NAACL 2024

  17. arXiv:2403.19365  [pdf, other

    cs.CL

    EthioMT: Parallel Corpus for Low-resource Ethiopian Languages

    Authors: Atnafu Lambebo Tonja, Olga Kolesnikova, Alexander Gelbukh, Jugal Kalita

    Abstract: Recent research in natural language processing (NLP) has achieved impressive performance in tasks such as machine translation (MT), news classification, and question-answering in high-resource languages. However, the performance of MT leaves much to be desired for low-resource languages. This is due to the smaller size of available parallel corpora in these languages, if such corpora are available… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted at The Fifth workshop on Resources for African Indigenous Languages (RAIL) 2024 ( LREC-COLING 2024)

  18. arXiv:2403.13737  [pdf, ps, other

    cs.CL

    EthioLLM: Multilingual Large Language Models for Ethiopian Languages with Task Evaluation

    Authors: Atnafu Lambebo Tonja, Israel Abebe Azime, Tadesse Destaw Belay, Mesay Gemeda Yigezu, Moges Ahmed Mehamed, Abinew Ali Ayele, Ebrahim Chekol Jibril, Michael Melese Woldeyohannis, Olga Kolesnikova, Philipp Slusallek, Dietrich Klakow, Shengwu Xiong, Seid Muhie Yimam

    Abstract: Large language models (LLMs) have gained popularity recently due to their outstanding performance in various downstream Natural Language Processing (NLP) tasks. However, low-resource languages are still lagging behind current state-of-the-art (SOTA) developments in the field of NLP due to insufficient resources to train LLMs. Ethiopian languages exhibit remarkable linguistic diversity, encompassin… ▽ More

    Submitted 23 June, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted at LREC-Coling 2024

  19. arXiv:2312.04764  [pdf, other

    cs.CL

    First Attempt at Building Parallel Corpora for Machine Translation of Northeast India's Very Low-Resource Languages

    Authors: Atnafu Lambebo Tonja, Melkamu Mersha, Ananya Kalita, Olga Kolesnikova, Jugal Kalita

    Abstract: This paper presents the creation of initial bilingual corpora for thirteen very low-resource languages of India, all from Northeast India. It also presents the results of initial translation efforts in these languages. It creates the first-ever parallel corpora for these languages and provides initial benchmark neural machine translation results for these languages. We intend to extend these corpo… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Accepted to ICON 2023

  20. arXiv:2311.04189  [pdf

    cs.CL

    SpaDeLeF: A Dataset for Hierarchical Classification of Lexical Functions for Collocations in Spanish

    Authors: Yevhen Kostiuk, Grigori Sidorov, Olga Kolesnikova

    Abstract: In natural language processing (NLP), lexical function is a concept to unambiguously represent semantic and syntactic features of words and phrases in text first crafted in the Meaning-Text Theory. Hierarchical classification of lexical functions involves organizing these features into a tree-like hierarchy of categories or labels. This is a challenging task as it requires a good understanding of… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  21. arXiv:2306.01261  [pdf, other

    cs.CL

    Automatic Translation of Hate Speech to Non-hate Speech in Social Media Texts

    Authors: Yevhen Kostiuk, Atnafu Lambebo Tonja, Grigori Sidorov, Olga Kolesnikova

    Abstract: In this paper, we investigate the issue of hate speech by presenting a novel task of translating hate speech into non-hate speech text while preserving its meaning. As a case study, we use Spanish texts. We provide a dataset and several baselines as a starting point for further research in the task. We evaluated our baseline results using multiple metrics, including BLEU scores. The aim of this st… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

  22. arXiv:2305.17406  [pdf, other

    cs.CL

    Enhancing Translation for Indigenous Languages: Experiments with Multilingual Models

    Authors: Atnafu Lambebo Tonja, Hellina Hailu Nigatu, Olga Kolesnikova, Grigori Sidorov, Alexander Gelbukh, Jugal Kalita

    Abstract: This paper describes CIC NLP's submission to the AmericasNLP 2023 Shared Task on machine translation systems for indigenous languages of the Americas. We present the system descriptions for three methods. We used two multilingual models, namely M2M-100 and mBART50, and one bilingual (one-to-one) -- Helsinki NLP Spanish-English translation model, and experimented with different transfer learning se… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: Accepted to Third Workshop on NLP for Indigenous Languages of the Americas

  23. arXiv:2305.17404  [pdf, other

    cs.CL

    Parallel Corpus for Indigenous Language Translation: Spanish-Mazatec and Spanish-Mixtec

    Authors: Atnafu Lambebo Tonja, Christian Maldonado-Sifuentes, David Alejandro Mendoza Castillo, Olga Kolesnikova, Noé Castro-Sánchez, Grigori Sidorov, Alexander Gelbukh

    Abstract: In this paper, we present a parallel Spanish-Mazatec and Spanish-Mixtec corpus for machine translation (MT) tasks, where Mazatec and Mixtec are two indigenous Mexican languages. We evaluated the usability of the collected corpus using three different approaches: transformer, transfer learning, and fine-tuning pre-trained multilingual MT models. Fine-tuning the Facebook M2M100-48 model outperformed… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: Accepted to Third Workshop on NLP for Indigenous Languages of the Americas

  24. arXiv:2303.14406  [pdf, other

    cs.CL

    Natural Language Processing in Ethiopian Languages: Current State, Challenges, and Opportunities

    Authors: Atnafu Lambebo Tonja, Tadesse Destaw Belay, Israel Abebe Azime, Abinew Ali Ayele, Moges Ahmed Mehamed, Olga Kolesnikova, Seid Muhie Yimam

    Abstract: This survey delves into the current state of natural language processing (NLP) for four Ethiopian languages: Amharic, Afaan Oromo, Tigrinya, and Wolaytta. Through this paper, we identify key challenges and opportunities for NLP research in Ethiopia. Furthermore, we provide a centralized repository on GitHub that contains publicly available resources for various NLP tasks in these languages. This r… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    Comments: Accepted to Fourth workshop on Resources for African Indigenous Languages (RAIL), EACL2023

  25. arXiv:2211.14459  [pdf, other

    cs.CL cs.AI

    Transformer-based Model for Word Level Language Identification in Code-mixed Kannada-English Texts

    Authors: Atnafu Lambebo Tonja, Mesay Gemeda Yigezu, Olga Kolesnikova, Moein Shahiki Tash, Grigori Sidorov, Alexander Gelbuk

    Abstract: Using code-mixed data in natural language processing (NLP) research currently gets a lot of attention. Language identification of social media code-mixed text has been an interesting problem of study in recent years due to the advancement and influences of social media in communication. This paper presents the Instituto Politécnico Nacional, Centro de Investigación en Computación (CIC) team's syst… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

  26. arXiv:2210.15224  [pdf, other

    cs.CL

    The Effect of Normalization for Bi-directional Amharic-English Neural Machine Translation

    Authors: Tadesse Destaw Belay, Atnafu Lambebo Tonja, Olga Kolesnikova, Seid Muhie Yimam, Abinew Ali Ayele, Silesh Bogale Haile, Grigori Sidorov, Alexander Gelbukh

    Abstract: Machine translation (MT) is one of the main tasks in natural language processing whose objective is to translate texts automatically from one natural language to another. Nowadays, using deep neural networks for MT tasks has received great attention. These networks require lots of data to learn abstract representations of the input and store it in continuous vectors. This paper presents the first… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.