Search | arXiv e-print repository

NLP for Social Good: A Survey of Challenges, Opportunities, and Responsible Deployment

Authors: Antonia Karamolegkou, Angana Borah, Eunjung Cho, Sagnik Ray Choudhury, Martina Galletti, Rajarshi Ghosh, Pranav Gupta, Oana Ignat, Priyanka Kargupta, Neema Kotonya, Hemank Lamba, Sun-Joo Lee, Arushi Mangla, Ishani Mondal, Deniz Nazarova, Poli Nemkova, Dina Pisarevskaya, Naquee Rizwan, Nazanin Sabri, Dominik Stammbach, Anna Steinberg, David Tomás, Steven R Wilson, Bowen Yi, Jessica H Zhu , et al. (7 additional authors not shown)

Abstract: Recent advancements in large language models (LLMs) have unlocked unprecedented possibilities across a range of applications. However, as a community, we believe that the field of Natural Language Processing (NLP) has a growing need to approach deployment with greater intentionality and responsibility. In alignment with the broader vision of AI for Social Good (Tomašev et al., 2020), this paper ex… ▽ More Recent advancements in large language models (LLMs) have unlocked unprecedented possibilities across a range of applications. However, as a community, we believe that the field of Natural Language Processing (NLP) has a growing need to approach deployment with greater intentionality and responsibility. In alignment with the broader vision of AI for Social Good (Tomašev et al., 2020), this paper examines the role of NLP in addressing pressing societal challenges. Through a cross-disciplinary analysis of social goals and emerging risks, we highlight promising research directions and outline challenges that must be addressed to ensure responsible and equitable progress in NLP4SG research. △ Less

Submitted 28 May, 2025; originally announced May 2025.

arXiv:2501.16193 [pdf, other]

Posting Patterns of Members of Parental Subreddits

Authors: Nazanin Sabri, Mai Elsherief

Abstract: Online forums (e.g., Reddit) are used by many parents to discuss their challenges, needs, and receive support. While studies have investigated the contents of posts made to popular parental subreddits revealing the family health concerns being expressed, little is known about parents' posting patterns or other issues they engage in. In this study, we explore the posting activity of users of 55 par… ▽ More Online forums (e.g., Reddit) are used by many parents to discuss their challenges, needs, and receive support. While studies have investigated the contents of posts made to popular parental subreddits revealing the family health concerns being expressed, little is known about parents' posting patterns or other issues they engage in. In this study, we explore the posting activity of users of 55 parental subreddits. Exploring posts made by these users (667K) across Reddit (34M posts) reveals that over 85% of posters are not one-time users of Reddit and actively engage with the community. Studying cross-posting patterns also reveals the use of subreddits dedicated to other topics such as relationship and health advice (e.g., r/AskDocs, r/relationship_advice) by this population. As a result, for a comprehensive understanding of the type of information posters share and seek, future work should investigate sub-communities outside of parental-specific ones. Finally, we expand the list of parental subreddits, compiling a total of 115 subreddits that could be utilized in future studies of parental concerns. △ Less

Submitted 27 January, 2025; originally announced January 2025.

arXiv:2406.01866 [pdf, other]

#EpiTwitter: Public Health Messaging During the COVID-19 Pandemic

Authors: Ashwin Rao, Nazanin Sabri, Siyi Guo, Louiqa Raschid, Kristina Lerman

Abstract: Effective communication during health crises is critical, with social media serving as a key platform for public health experts (PHEs) to engage with the public. However, it also amplifies pseudo-experts promoting contrarian views. Despite its importance, the role of emotional and moral language in PHEs' communication during COVID-19 remains under explored. This study examines how PHEs and pseudo-… ▽ More Effective communication during health crises is critical, with social media serving as a key platform for public health experts (PHEs) to engage with the public. However, it also amplifies pseudo-experts promoting contrarian views. Despite its importance, the role of emotional and moral language in PHEs' communication during COVID-19 remains under explored. This study examines how PHEs and pseudo-experts communicated on Twitter during the pandemic, focusing on emotional and moral language and their engagement with political elites. Analyzing tweets from 489 PHEs and 356 pseudo-experts from January 2020 to January 2021, alongside public responses, we identified key priorities and differences in messaging strategy. PHEs prioritize masking, healthcare, education, and vaccines, using positive emotional language like optimism. In contrast, pseudo-experts discuss therapeutics and lockdowns more frequently, employing negative emotions like pessimism and disgust. Negative emotional and moral language tends to drive engagement, but positive language from PHEs fosters positivity in public responses. PHEs exhibit liberal partisanship, expressing more positivity towards liberals and negativity towards conservative elites, while pseudo-experts show conservative partisanship. These findings shed light on the polarization of COVID-19 discourse and underscore the importance of strategic use of emotional and moral language by experts to mitigate polarization and enhance public trust. △ Less

Submitted 10 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

arXiv:2212.10839 [pdf, other]

Consistent Range Approximation for Fair Predictive Modeling

Authors: Jiongli Zhu, Sainyam Galhotra, Nazanin Sabri, Babak Salimi

Abstract: This paper proposes a novel framework for certifying the fairness of predictive models trained on biased data. It draws from query answering for incomplete and inconsistent databases to formulate the problem of consistent range approximation (CRA) of fairness queries for a predictive model on a target population. The framework employs background knowledge of the data collection process and biased… ▽ More This paper proposes a novel framework for certifying the fairness of predictive models trained on biased data. It draws from query answering for incomplete and inconsistent databases to formulate the problem of consistent range approximation (CRA) of fairness queries for a predictive model on a target population. The framework employs background knowledge of the data collection process and biased data, working with or without limited statistics about the target population, to compute a range of answers for fairness queries. Using CRA, the framework builds predictive models that are certifiably fair on the target population, regardless of the availability of external data during training. The framework's efficacy is demonstrated through evaluations on real data, showing substantial improvement over existing state-of-the-art methods. △ Less

Submitted 28 July, 2023; v1 submitted 21 December, 2022; originally announced December 2022.

arXiv:2211.08029 [pdf, other]

Persian Emotion Detection using ParsBERT and Imbalanced Data Handling Approaches

Authors: Amirhossein Abaskohi, Nazanin Sabri, Behnam Bahrak

Abstract: Emotion recognition is one of the machine learning applications which can be done using text, speech, or image data gathered from social media spaces. Detecting emotion can help us in different fields, including opinion mining. With the spread of social media, different platforms like Twitter have become data sources, and the language used in these platforms is informal, making the emotion detecti… ▽ More Emotion recognition is one of the machine learning applications which can be done using text, speech, or image data gathered from social media spaces. Detecting emotion can help us in different fields, including opinion mining. With the spread of social media, different platforms like Twitter have become data sources, and the language used in these platforms is informal, making the emotion detection task difficult. EmoPars and ArmanEmo are two new human-labeled emotion datasets for the Persian language. These datasets, especially EmoPars, are suffering from inequality between several samples between two classes. In this paper, we evaluate EmoPars and compare them with ArmanEmo. Throughout this analysis, we use data augmentation techniques, data re-sampling, and class-weights with Transformer-based Pretrained Language Models(PLMs) to handle the imbalance problem of these datasets. Moreover, feature selection is used to enhance the models' performance by emphasizing the text's specific features. In addition, we provide a new policy for selecting data from EmoPars, which selects the high-confidence samples; as a result, the model does not see samples that do not have specific emotion during training. Our model reaches a Macro-averaged F1-score of 0.81 and 0.76 on ArmanEmo and EmoPars, respectively, which are new state-of-the-art results in these benchmarks. △ Less

Submitted 17 November, 2022; v1 submitted 15 November, 2022; originally announced November 2022.

Comments: 14 pages, 5 figures, 9 tables

Journal ref: ACM Transactions on Asian and Low-Resource Language Information Processing 2022

arXiv:2104.04770 [pdf, other]

UTNLP at SemEval-2021 Task 5: A Comparative Analysis of Toxic Span Detection using Attention-based, Named Entity Recognition, and Ensemble Models

Authors: Alireza Salemi, Nazanin Sabri, Emad Kebriaei, Behnam Bahrak, Azadeh Shakery

Abstract: Detecting which parts of a sentence contribute to that sentence's toxicity -- rather than providing a sentence-level verdict of hatefulness -- would increase the interpretability of models and allow human moderators to better understand the outputs of the system. This paper presents our team's, UTNLP, methodology and results in the SemEval-2021 shared task 5 on toxic spans detection. We test multi… ▽ More Detecting which parts of a sentence contribute to that sentence's toxicity -- rather than providing a sentence-level verdict of hatefulness -- would increase the interpretability of models and allow human moderators to better understand the outputs of the system. This paper presents our team's, UTNLP, methodology and results in the SemEval-2021 shared task 5 on toxic spans detection. We test multiple models and contextual embeddings and report the best setting out of all. The experiments start with keyword-based models and are followed by attention-based, named entity-based, transformers-based, and ensemble models. Our best approach, an ensemble model, achieves an F1 of 0.684 in the competition's evaluation phase. △ Less

Submitted 10 April, 2021; originally announced April 2021.

arXiv:2102.12700 [pdf, ps, other]

Sentiment Analysis of Persian-English Code-mixed Texts

Authors: Nazanin Sabri, Ali Edalat, Behnam Bahrak

Abstract: The rapid production of data on the internet and the need to understand how users are feeling from a business and research perspective has prompted the creation of numerous automatic monolingual sentiment detection systems. More recently however, due to the unstructured nature of data on social media, we are observing more instances of multilingual and code-mixed texts. This development in content… ▽ More The rapid production of data on the internet and the need to understand how users are feeling from a business and research perspective has prompted the creation of numerous automatic monolingual sentiment detection systems. More recently however, due to the unstructured nature of data on social media, we are observing more instances of multilingual and code-mixed texts. This development in content type has created a new demand for code-mixed sentiment analysis systems. In this study we collect, label and thus create a dataset of Persian-English code-mixed tweets. We then proceed to introduce a model which uses BERT pretrained embeddings as well as translation models to automatically learn the polarity scores of these Tweets. Our model outperforms the baseline models that use Naïve Bayes and Random Forest methods. △ Less

Submitted 25 February, 2021; originally announced February 2021.

arXiv:2012.03932 [pdf, other]

Investigating the effects of Goodreads challenges on individuals reading habits

Authors: Yasaman Jafari, Nazanin Sabri, Behnam Bahrak

Abstract: Sharing our goals with others and setting public challenges for ourselves is a topic that has been the center of many discussions. This study examines reading challenges, how participation in them has changed throughout the years, and how they influence users reading productivity. To do so, we analyze Goodreads, a social book cataloging website, with a yearly challenge feature. We further show tha… ▽ More Sharing our goals with others and setting public challenges for ourselves is a topic that has been the center of many discussions. This study examines reading challenges, how participation in them has changed throughout the years, and how they influence users reading productivity. To do so, we analyze Goodreads, a social book cataloging website, with a yearly challenge feature. We further show that gender is a significant factor in how successful individuals are in their challenges. Additionally, we investigate the association between participation in reading challenges and the number of books people read. △ Less

Submitted 15 February, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

arXiv:1902.03486 [pdf]

Machine Learning Forcefield for Silicate Glasses

Authors: Han Liu, Zipeng Fu, Yipeng Li, Nazreen Farina Ahmad Sabri, Mathieu Bauchy

Abstract: Developing accurate, transferable, and computationally-efficient interatomic forcefields is key to facilitate the modeling of silicate glasses. However, the high number of forcefield parameters that need to be optimized render traditional parameterization methods poorly efficient or potentially subject to bias. Here, we present a new forcefield parameterization methodology based on ab initio molec… ▽ More Developing accurate, transferable, and computationally-efficient interatomic forcefields is key to facilitate the modeling of silicate glasses. However, the high number of forcefield parameters that need to be optimized render traditional parameterization methods poorly efficient or potentially subject to bias. Here, we present a new forcefield parameterization methodology based on ab initio molecular dynamics simulations, Gaussian process regression, and Bayesian optimization. By taking the example of glassy silica, we show that our methodology yields a new interatomic forcefield that offers an unprecedented description of the atomic structure of silica. This methodology offers a new route to efficiently parameterize new empirical interatomic forcefields for silicate glasses with very limited need for intuition. △ Less

Submitted 9 February, 2019; originally announced February 2019.

Showing 1–9 of 9 results for author: Sabri, N