Search | arXiv e-print repository

Opioid Named Entity Recognition (ONER-2025) from Reddit

Authors: Grigori Sidorov, Muhammad Ahmad, Iqra Ameer, Muhammad Usman, Ildar Batyrshin

Abstract: The opioid overdose epidemic remains a critical public health crisis, particularly in the United States, leading to significant mortality and societal costs. Social media platforms like Reddit provide vast amounts of unstructured data that offer insights into public perceptions, discussions, and experiences related to opioid use. This study leverages Natural Language Processing (NLP), specifically… ▽ More The opioid overdose epidemic remains a critical public health crisis, particularly in the United States, leading to significant mortality and societal costs. Social media platforms like Reddit provide vast amounts of unstructured data that offer insights into public perceptions, discussions, and experiences related to opioid use. This study leverages Natural Language Processing (NLP), specifically Opioid Named Entity Recognition (ONER-2025), to extract actionable information from these platforms. Our research makes four key contributions. First, we created a unique, manually annotated dataset sourced from Reddit, where users share self-reported experiences of opioid use via different administration routes. This dataset contains 331,285 tokens and includes eight major opioid entity categories. Second, we detail our annotation process and guidelines while discussing the challenges of labeling the ONER-2025 dataset. Third, we analyze key linguistic challenges, including slang, ambiguity, fragmented sentences, and emotionally charged language, in opioid discussions. Fourth, we propose a real-time monitoring system to process streaming data from social media, healthcare records, and emergency services to identify overdose events. Using 5-fold cross-validation in 11 experiments, our system integrates machine learning, deep learning, and transformer-based language models with advanced contextual embeddings to enhance understanding. Our transformer-based models (bert-base-NER and roberta-base) achieved 97% accuracy and F1-score, outperforming baselines by 10.23% (RF=0.88). △ Less

Submitted 30 April, 2025; v1 submitted 28 March, 2025; originally announced April 2025.

arXiv:2503.10688 [pdf, other]

CULEMO: Cultural Lenses on Emotion -- Benchmarking LLMs for Cross-Cultural Emotion Understanding

Authors: Tadesse Destaw Belay, Ahmed Haj Ahmed, Alvin Grissom II, Iqra Ameer, Grigori Sidorov, Olga Kolesnikova, Seid Muhie Yimam

Abstract: NLP research has increasingly focused on subjective tasks such as emotion analysis. However, existing emotion benchmarks suffer from two major shortcomings: (1) they largely rely on keyword-based emotion recognition, overlooking crucial cultural dimensions required for deeper emotion understanding, and (2) many are created by translating English-annotated data into other languages, leading to pote… ▽ More NLP research has increasingly focused on subjective tasks such as emotion analysis. However, existing emotion benchmarks suffer from two major shortcomings: (1) they largely rely on keyword-based emotion recognition, overlooking crucial cultural dimensions required for deeper emotion understanding, and (2) many are created by translating English-annotated data into other languages, leading to potentially unreliable evaluation. To address these issues, we introduce Cultural Lenses on Emotion (CuLEmo), the first benchmark designed to evaluate culture-aware emotion prediction across six languages: Amharic, Arabic, English, German, Hindi, and Spanish. CuLEmo comprises 400 crafted questions per language, each requiring nuanced cultural reasoning and understanding. We use this benchmark to evaluate several state-of-the-art LLMs on culture-aware emotion prediction and sentiment analysis tasks. Our findings reveal that (1) emotion conceptualizations vary significantly across languages and cultures, (2) LLMs performance likewise varies by language and cultural context, and (3) prompting in English with explicit country context often outperforms in-language prompts for culture-aware emotion and sentiment understanding. The dataset and evaluation code are publicly available. △ Less

Submitted 27 May, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

Comments: ACL-main 2025

arXiv:2207.06223 [pdf, other]

Job Offers Classifier using Neural Networks and Oversampling Methods

Authors: Germán Ortiz, Gemma Bel Enguix, Helena Gómez-Adorno, Iqra Ameer, Grigori Sidorov

Abstract: Both policy and research benefit from a better understanding of individuals' jobs. However, as large-scale administrative records are increasingly employed to represent labor market activity, new automatic methods to classify jobs will become necessary. We developed an automatic job offers classifier using a dataset collected from the largest job bank of Mexico known as Bumeran https://www.bumeran… ▽ More Both policy and research benefit from a better understanding of individuals' jobs. However, as large-scale administrative records are increasingly employed to represent labor market activity, new automatic methods to classify jobs will become necessary. We developed an automatic job offers classifier using a dataset collected from the largest job bank of Mexico known as Bumeran https://www.bumeran.com.mx/ Last visited: 19-01-2022.. We applied machine learning algorithms such as Support Vector Machines, Naive-Bayes, Logistic Regression, Random Forest, and deep learning Long-Short Term Memory (LSTM). Using these algorithms, we trained multi-class models to classify job offers in one of the 23 classes (not uniformly distributed): Sales, Administration, Call Center, Technology, Trades, Human Resources, Logistics, Marketing, Health, Gastronomy, Financing, Secretary, Production, Engineering, Education, Design, Legal, Construction, Insurance, Communication, Management, Foreign Trade, and Mining. We used the SMOTE, Geometric-SMOTE, and ADASYN synthetic oversampling algorithms to handle imbalanced classes. The proposed convolutional neural network architecture achieved the best results when applied the Geometric-SMOTE algorithm. △ Less

Submitted 3 July, 2022; originally announced July 2022.

Comments: 13 pages, 2 figures, 8th World Conference On Soft Computing

arXiv:2207.01012 [pdf, other]

Mental Illness Classification on Social Media Texts using Deep Learning and Transfer Learning

Authors: Iqra Ameer, Muhammad Arif, Grigori Sidorov, Helena Gòmez-Adorno, Alexander Gelbukh

Abstract: Given the current social distance restrictions across the world, most individuals now use social media as their major medium of communication. Millions of people suffering from mental diseases have been isolated due to this, and they are unable to get help in person. They have become more reliant on online venues to express themselves and seek advice on dealing with their mental disorders. Accordi… ▽ More Given the current social distance restrictions across the world, most individuals now use social media as their major medium of communication. Millions of people suffering from mental diseases have been isolated due to this, and they are unable to get help in person. They have become more reliant on online venues to express themselves and seek advice on dealing with their mental disorders. According to the World health organization (WHO), approximately 450 million people are affected. Mental illnesses, such as depression, anxiety, etc., are immensely common and have affected an individuals' physical health. Recently Artificial Intelligence (AI) methods have been presented to help mental health providers, including psychiatrists and psychologists, in decision making based on patients' authentic information (e.g., medical records, behavioral data, social media utilization, etc.). AI innovations have demonstrated predominant execution in numerous real-world applications broadening from computer vision to healthcare. This study analyzes unstructured user data on the Reddit platform and classifies five common mental illnesses: depression, anxiety, bipolar disorder, ADHD, and PTSD. We trained traditional machine learning, deep learning, and transfer learning multi-class models to detect mental disorders of individuals. This effort will benefit the public health system by automating the detection process and informing appropriate authorities about people who require emergency assistance. △ Less

Submitted 3 July, 2022; originally announced July 2022.

Comments: 11 pages, 2 figures, 8th World Conference On Soft Computing

arXiv:2204.11714 [pdf, other]

The Causal News Corpus: Annotating Causal Relations in Event Sentences from News

Authors: Fiona Anting Tan, Ali Hürriyetoğlu, Tommaso Caselli, Nelleke Oostdijk, Tadashi Nomoto, Hansi Hettiarachchi, Iqra Ameer, Onur Uca, Farhana Ferdousi Liza, Tiancheng Hu

Abstract: Despite the importance of understanding causality, corpora addressing causal relations are limited. There is a discrepancy between existing annotation guidelines of event causality and conventional causality corpora that focus more on linguistics. Many guidelines restrict themselves to include only explicit relations or clause-based arguments. Therefore, we propose an annotation schema for event c… ▽ More Despite the importance of understanding causality, corpora addressing causal relations are limited. There is a discrepancy between existing annotation guidelines of event causality and conventional causality corpora that focus more on linguistics. Many guidelines restrict themselves to include only explicit relations or clause-based arguments. Therefore, we propose an annotation schema for event causality that addresses these concerns. We annotated 3,559 event sentences from protest event news with labels on whether it contains causal relations or not. Our corpus is known as the Causal News Corpus (CNC). A neural network built upon a state-of-the-art pre-trained language model performed well with 81.20% F1 score on test set, and 83.46% in 5-folds cross-validation. CNC is transferable across two external corpora: CausalTimeBank (CTB) and Penn Discourse Treebank (PDTB). Leveraging each of these external datasets for training, we achieved up to approximately 64% F1 on the CNC test set without additional fine-tuning. CNC also served as an effective training and pre-training dataset for the two external corpora. Lastly, we demonstrate the difficulty of our task to the layman in a crowd-sourced annotation exercise. Our annotated corpus is publicly available, providing a valuable resource for causal text mining researchers. △ Less

Submitted 25 April, 2022; originally announced April 2022.

Comments: Accepted to LREC 2022

arXiv:1805.09169 [pdf]

A distinct approach to diagnose Dengue Fever with the help of Soft Set Theory

Authors: Maaz Amjad, fariha Bukhari, Iqra Ameer, Alexander Gelbukh

Abstract: Mathematics has played a substantial role to revolutionize the medical science. Intelligent systems based on mathematical theories have proved to be efficient in diagnosing various diseases. In this paper, we used an expert system based on soft set theory and fuzzy set theory named as a soft expert system to diagnose tropical disease dengue. The objective to use soft expert system is to predict th… ▽ More Mathematics has played a substantial role to revolutionize the medical science. Intelligent systems based on mathematical theories have proved to be efficient in diagnosing various diseases. In this paper, we used an expert system based on soft set theory and fuzzy set theory named as a soft expert system to diagnose tropical disease dengue. The objective to use soft expert system is to predict the risk level of a patient having dengue fever by using input variables like age, TLC, SGOT, platelets count and blood pressure. The proposed method explicitly demonstrates the exact percentage of the risk level of dengue fever automatically circumventing for all possible (medical) imprecisions. △ Less

Submitted 21 November, 2018; v1 submitted 22 May, 2018; originally announced May 2018.

Comments: 10 pages

Showing 1–6 of 6 results for author: Ameer, I