Skip to main content

Showing 1–33 of 33 results for author: KhudaBukhsh, A

.
  1. arXiv:2504.06160  [pdf, other

    cs.CL cs.AI cs.CY cs.LG cs.SI

    Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups

    Authors: Rijul Magu, Arka Dutta, Sean Kim, Ashiqur R. KhudaBukhsh, Munmun De Choudhury

    Abstract: Large Language Models (LLMs) have been shown to demonstrate imbalanced biases against certain groups. However, the study of unprovoked targeted attacks by LLMs towards at-risk populations remains underexplored. Our paper presents three novel contributions: (1) the explicit evaluation of LLM-generated attacks on highly vulnerable mental health groups; (2) a network-based framework to study the prop… ▽ More

    Submitted 11 April, 2025; v1 submitted 8 April, 2025; originally announced April 2025.

    ACM Class: J.4; K.4.1; K.4.2

  2. arXiv:2503.21513  [pdf, other

    cs.CL

    Datasets for Depression Modeling in Social Media: An Overview

    Authors: Ana-Maria Bucur, Andreea-Codrina Moldovan, Krutika Parvatikar, Marcos Zampieri, Ashiqur R. KhudaBukhsh, Liviu P. Dinu

    Abstract: Depression is the most common mental health disorder, and its prevalence increased during the COVID-19 pandemic. As one of the most extensively researched psychological conditions, recent research has increasingly focused on leveraging social media data to enhance traditional methods of depression screening. This paper addresses the growing interest in interdisciplinary research on depression, and… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: Accepted to CLPsych Workshop, NAACL 2025

  3. arXiv:2502.09004  [pdf, other

    cs.CL cs.CY cs.LG

    Hope vs. Hate: Understanding User Interactions with LGBTQ+ News Content in Mainstream US News Media through the Lens of Hope Speech

    Authors: Jonathan Pofcher, Christopher M. Homan, Randall Sell, Ashiqur R. KhudaBukhsh

    Abstract: This paper makes three contributions. First, via a substantial corpus of 1,419,047 comments posted on 3,161 YouTube news videos of major US cable news outlets, we analyze how users engage with LGBTQ+ news content. Our analyses focus both on positive and negative content. In particular, we construct a fine-grained hope speech classifier that detects positive (hope speech), negative, neutral, and ir… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

  4. arXiv:2410.09978  [pdf, other

    cs.CL cs.CY

    When Neutral Summaries are not that Neutral: Quantifying Political Neutrality in LLM-Generated News Summaries

    Authors: Supriti Vijay, Aman Priyanshu, Ashique R. KhudaBukhsh

    Abstract: In an era where societal narratives are increasingly shaped by algorithmic curation, investigating the political neutrality of LLMs is an important research question. This study presents a fresh perspective on quantifying the political neutrality of LLMs through the lens of abstractive text summarization of polarizing news articles. We consider five pressing issues in current US politics: abortion… ▽ More

    Submitted 13 October, 2024; originally announced October 2024.

    Comments: 12 pages, 3 figures, 4 tables

  5. arXiv:2410.08793  [pdf, ps, other

    cs.CL

    On the State of NLP Approaches to Modeling Depression in Social Media: A Post-COVID-19 Outlook

    Authors: Ana-Maria Bucur, Andreea-Codrina Moldovan, Krutika Parvatikar, Marcos Zampieri, Ashiqur R. KhudaBukhsh, Liviu P. Dinu

    Abstract: Computational approaches to predicting mental health conditions in social media have been substantially explored in the past years. Multiple reviews have been published on this topic, providing the community with comprehensive accounts of the research in this area. Among all mental health conditions, depression is the most widely studied due to its worldwide prevalence. The COVID-19 global pandemi… ▽ More

    Submitted 7 March, 2025; v1 submitted 11 October, 2024; originally announced October 2024.

  6. arXiv:2409.12218  [pdf, other

    cs.CL cs.LG

    ARTICLE: Annotator Reliability Through In-Context Learning

    Authors: Sujan Dutta, Deepak Pandita, Tharindu Cyril Weerasooriya, Marcos Zampieri, Christopher M. Homan, Ashiqur R. KhudaBukhsh

    Abstract: Ensuring annotator quality in training and evaluation data is a key piece of machine learning in NLP. Tasks such as sentiment analysis and offensive speech detection are intrinsically subjective, creating a challenging scenario for traditional quality assessment approaches because it is hard to distinguish disagreement due to poor work from that due to differences of opinions between sincere annot… ▽ More

    Submitted 19 September, 2024; v1 submitted 18 September, 2024; originally announced September 2024.

  7. arXiv:2409.12194  [pdf, other

    cs.CL cs.CY

    Gender Representation and Bias in Indian Civil Service Mock Interviews

    Authors: Somonnoy Banerjee, Sujan Dutta, Soumyajit Datta, Ashiqur R. KhudaBukhsh

    Abstract: This paper makes three key contributions. First, via a substantial corpus of 51,278 interview questions sourced from 888 YouTube videos of mock interviews of Indian civil service candidates, we demonstrate stark gender bias in the broad nature of questions asked to male and female candidates. Second, our experiments with large language models show a strong presence of gender bias in explanations p… ▽ More

    Submitted 20 September, 2024; v1 submitted 18 September, 2024; originally announced September 2024.

  8. arXiv:2408.08411  [pdf, other

    cs.CL

    Rater Cohesion and Quality from a Vicarious Perspective

    Authors: Deepak Pandita, Tharindu Cyril Weerasooriya, Sujan Dutta, Sarah K. Luger, Tharindu Ranasinghe, Ashiqur R. KhudaBukhsh, Marcos Zampieri, Christopher M. Homan

    Abstract: Human feedback is essential for building human-centered AI systems across domains where disagreement is prevalent, such as AI safety, content moderation, or sentiment analysis. Many disagreements, particularly in politically charged settings, arise because raters have opposing values or beliefs. Vicarious annotation is a method for breaking down disagreement by asking raters how they think others… ▽ More

    Submitted 4 October, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

    Comments: Accepted at EMNLP 2024 Findings

  9. arXiv:2404.11752  [pdf

    cs.CL cs.CY

    Mapping Violence: Developing an Extensive Framework to Build a Bangla Sectarian Expression Dataset from Social Media Interactions

    Authors: Nazia Tasnim, Sujan Sen Gupta, Md. Istiak Hossain Shihab, Fatiha Islam Juee, Arunima Tahsin, Pritom Ghum, Kanij Fatema, Marshia Haque, Wasema Farzana, Prionti Nasir, Ashique KhudaBukhsh, Farig Sadeque, Asif Sushmit

    Abstract: Communal violence in online forums has become extremely prevalent in South Asia, where many communities of different cultures coexist and share resources. These societies exhibit a phenomenon characterized by strong bonds within their own groups and animosity towards others, leading to conflicts that frequently escalate into violent confrontations. To address this issue, we have developed the firs… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  10. arXiv:2403.13272  [pdf, other

    cs.CY cs.CL cs.SI

    Community Needs and Assets: A Computational Analysis of Community Conversations

    Authors: Md Towhidul Absar Chowdhury, Naveen Sharma, Ashiqur R. KhudaBukhsh

    Abstract: A community needs assessment is a tool used by non-profits and government agencies to quantify the strengths and issues of a community, allowing them to allocate their resources better. Such approaches are transitioning towards leveraging social media conversations to analyze the needs of communities and the assets already present within them. However, manual analysis of exponentially increasing s… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  11. arXiv:2402.13528  [pdf, other

    cs.CY cs.CL cs.LG cs.SI

    Infrastructure Ombudsman: Mining Future Failure Concerns from Structural Disaster Response

    Authors: Md Towhidul Absar Chowdhury, Soumyajit Datta, Naveen Sharma, Ashiqur R. KhudaBukhsh

    Abstract: Current research concentrates on studying discussions on social media related to structural failures to improve disaster response strategies. However, detecting social web posts discussing concerns about anticipatory failures is under-explored. If such concerns are channeled to the appropriate authorities, it can aid in the prevention and mitigation of potential infrastructural failures. In this p… ▽ More

    Submitted 21 February, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  12. arXiv:2310.07078  [pdf, other

    cs.LG cs.AI cs.CL

    Auditing and Robustifying COVID-19 Misinformation Datasets via Anticontent Sampling

    Authors: Clay H. Yoo, Ashiqur R. KhudaBukhsh

    Abstract: This paper makes two key contributions. First, it argues that highly specialized rare content classifiers trained on small data typically have limited exposure to the richness and topical diversity of the negative class (dubbed anticontent) as observed in the wild. As a result, these classifiers' strong performance observed on the test set may not translate into real-world settings. In the context… ▽ More

    Submitted 5 August, 2023; originally announced October 2023.

    Comments: This paper has been accepted at AAAI 2023 (Robust and Safe AI track)

  13. arXiv:2309.06415  [pdf, other

    cs.CL cs.CY

    Down the Toxicity Rabbit Hole: A Novel Framework to Bias Audit Large Language Models

    Authors: Arka Dutta, Adel Khorramrouz, Sujan Dutta, Ashiqur R. KhudaBukhsh

    Abstract: This paper makes three contributions. First, it presents a generalizable, novel framework dubbed \textit{toxicity rabbit hole} that iteratively elicits toxic content from a wide suite of large language models. Spanning a set of 1,266 identity groups, we first conduct a bias audit of \texttt{PaLM 2} guardrails presenting key insights. Next, we report generalizability across several other models. Th… ▽ More

    Submitted 30 March, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

  14. arXiv:2307.10200  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Disentangling Societal Inequality from Model Biases: Gender Inequality in Divorce Court Proceedings

    Authors: Sujan Dutta, Parth Srivastava, Vaishnavi Solunke, Swaprava Nath, Ashiqur R. KhudaBukhsh

    Abstract: Divorce is the legal dissolution of a marriage by a court. Since this is usually an unpleasant outcome of a marital union, each party may have reasons to call the decision to quit which is generally documented in detail in the court proceedings. Via a substantial corpus of 17,306 court proceedings, this paper investigates gender inequality through the lens of divorce court proceedings. While emerg… ▽ More

    Submitted 8 July, 2023; originally announced July 2023.

    Comments: This paper is accepted at IJCAI 2023 (AI for good track)

  15. arXiv:2307.10189  [pdf, other

    cs.IR cs.CL cs.SI

    Subjective Crowd Disagreements for Subjective Data: Uncovering Meaningful CrowdOpinion with Population-level Learning

    Authors: Tharindu Cyril Weerasooriya, Sarah Luger, Saloni Poddar, Ashiqur R. KhudaBukhsh, Christopher M. Homan

    Abstract: Human-annotated data plays a critical role in the fairness of AI systems, including those that deal with life-altering decisions or moderating human-created web/social media content. Conventionally, annotator disagreements are resolved before any learning takes place. However, researchers are increasingly identifying annotator disagreement as pervasive and meaningful. They also question the perfor… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

    Comments: Accepted for Publication at ACL 2023

  16. arXiv:2307.03764  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    For Women, Life, Freedom: A Participatory AI-Based Social Web Analysis of a Watershed Moment in Iran's Gender Struggles

    Authors: Adel Khorramrouz, Sujan Dutta, Ashiqur R. KhudaBukhsh

    Abstract: In this paper, we present a computational analysis of the Persian language Twitter discourse with the aim to estimate the shift in stance toward gender equality following the death of Mahsa Amini in police custody. We present an ensemble active learning pipeline to train a stance classifier. Our novelty lies in the involvement of Iranian women in an active role as annotators in building this AI sy… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

    Comments: Accepted at IJCAI 2023 (AI for good track)

  17. arXiv:2303.17201  [pdf, other

    cs.CY

    Quantifying the Academic Quality of Children's Videos using Machine Comprehension

    Authors: Sumeet Kumar, Mallikarjuna T., Ashiqur Khudabukhsh

    Abstract: YouTube Kids (YTK) is one of the most popular kids' applications used by millions of kids daily. However, various studies have highlighted concerns about the videos on the platform, like the over-presence of entertaining and commercial content. YouTube recently proposed high-quality guidelines that include `promoting learning' and proposed to use it in ranking channels. However, the concept of lea… ▽ More

    Submitted 5 February, 2024; v1 submitted 30 March, 2023; originally announced March 2023.

  18. arXiv:2301.12534  [pdf, other

    cs.CL cs.CY cs.LG

    Vicarious Offense and Noise Audit of Offensive Speech Classifiers: Unifying Human and Machine Disagreement on What is Offensive

    Authors: Tharindu Cyril Weerasooriya, Sujan Dutta, Tharindu Ranasinghe, Marcos Zampieri, Christopher M. Homan, Ashiqur R. KhudaBukhsh

    Abstract: Offensive speech detection is a key component of content moderation. However, what is offensive can be highly subjective. This paper investigates how machine and human moderators disagree on what is offensive when it comes to real-world social web political discourse. We show that (1) there is extensive disagreement among the moderators (humans and machines); and (2) human and large-language-model… ▽ More

    Submitted 9 November, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

    Comments: Accepted to appear at EMNLP 2023

  19. arXiv:2206.10594  [pdf

    cs.SI

    How is Vaping Framed on Online Knowledge Dissemination Platforms?

    Authors: Keyu Chen, Yiwen Shi, Jun Luo, Joyce Jiang, Shweta Yadav, Munmun De Choudhury, Ashiqur R. KhudaBukhsh, Marzieh Babaeianjelodar, Frederick Altice, Navin Kumar

    Abstract: We analyze 1,888 articles and 1,119,453 vaping posts to study how vaping is framed across multiple knowledge dissemination platforms (Wikipedia, Quora, Medium, Reddit, Stack Exchange, wikiHow). We use various NLP techniques to understand these differences. For example, n-grams, emotion recognition, and question answering results indicate that Medium, Quora, and Stack Exchange are appropriate venue… ▽ More

    Submitted 22 July, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

    Comments: arXiv admin note: text overlap with arXiv:2206.07765, arXiv:2206.09024

  20. arXiv:2206.09024  [pdf

    cs.SI

    Partisan US News Media Representations of Syrian Refugees

    Authors: Keyu Chen, Marzieh Babaeianjelodar, Yiwen Shi, Kamila Janmohamed, Rupak Sarkar, Ingmar Weber, Thomas Davidson, Munmun De Choudhury, Jonathan Huang, Shweta Yadav, Ashique Khudabukhsh, Preslav Ivanov Nakov, Chris Bauch, Orestis Papakyriakopoulos, Kaveh Khoshnood, Navin Kumar

    Abstract: We investigate how representations of Syrian refugees (2011-2021) differ across US partisan news outlets. We analyze 47,388 articles from the online US media about Syrian refugees to detail differences in reporting between left- and right-leaning media. We use various NLP techniques to understand these differences. Our polarization and question answering results indicated that left-leaning media t… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

  21. arXiv:2206.07765  [pdf

    cs.SI

    US News and Social Media Framing around Vaping

    Authors: Keyu Chen, Marzieh Babaeianjelodar, Yiwen Shi, Rohan Aanegola, Lam Yin Cheung, Preslav Ivanov Nakov, Shweta Yadav, Angus Bancroft, Ashiqur R. KhudaBukhsh, Munmun De Choudhury, Frederick L. Altice, Navin Kumar

    Abstract: In this paper, we investigate how vaping is framed differently (2008-2021) between US news and social media. We analyze 15,711 news articles and 1,231,379 Facebook posts about vaping to study the differences in framing between media varieties. We use word embeddings to provide two-dimensional visualizations of the semantic changes around vaping for news and for social media. We detail that news me… ▽ More

    Submitted 22 July, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

  22. arXiv:2203.04837  [pdf, other

    eess.AS cs.CL cs.CY

    'Beach' to 'Bitch': Inadvertent Unsafe Transcription of Kids' Content on YouTube

    Authors: Krithika Ramesh, Ashiqur R. KhudaBukhsh, Sumeet Kumar

    Abstract: Over the last few years, YouTube Kids has emerged as one of the highly competitive alternatives to television for children's entertainment. Consequently, YouTube Kids' content should receive an additional level of scrutiny to ensure children's safety. While research on detecting offensive or inappropriate content for kids is gaining momentum, little or no current work exists that investigates to w… ▽ More

    Submitted 17 February, 2022; originally announced March 2022.

    Comments: This paper got accepted at AAAI 2022, AI for Social Impact track

  23. arXiv:2106.12044  [pdf, other

    cs.SI cs.CY

    Empathy and Hope: Resource Transfer to Model Inter-country Social Media Dynamics

    Authors: Clay H. Yoo, Shriphani Palakodety, Rupak Sarkar, Ashiqur R. KhudaBukhsh

    Abstract: The ongoing COVID-19 pandemic resulted in significant ramifications for international relations ranging from travel restrictions, global ceasefires, and international vaccine production and sharing agreements. Amidst a wave of infections in India that resulted in a systemic breakdown of healthcare infrastructure, a social welfare organization based in Pakistan offered to procure medical-grade oxyg… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

  24. arXiv:2104.05611  [pdf, other

    cs.SI cs.CY

    Exploring Polarization of Users Behavior on Twitter During the 2019 South American Protests

    Authors: Ramon Villa-Cox, Helen, Zeng, Ashiqur R. KhudaBukhsh, Kathleen M. Carley

    Abstract: Research across different disciplines has documented the expanding polarization in social media. However, much of it focused on the US political system or its culturally controversial topics. In this work, we explore polarization on Twitter in a different context, namely the protest that paralyzed several countries in the South American region in 2019. By leveraging users' endorsement of politicia… ▽ More

    Submitted 5 April, 2021; originally announced April 2021.

  25. arXiv:2102.09103  [pdf, other

    cs.CY

    Gender Bias, Social Bias and Representation: 70 Years of B$^H$ollywood

    Authors: Kunal Khadilkar, Ashiqur R. KhudaBukhsh, Tom M. Mitchell

    Abstract: With an outreach in more than 90 countries, a market share of 2.1 billion dollars and a target audience base of at least 1.2 billion people, Bollywood, aka the Mumbai film industry, is a formidable entertainment force. While the number of lives Bollywood can potentially touch is massive, no comprehensive NLP study on the evolution of social and gender biases in Bollywood dialogues exists. Via a su… ▽ More

    Submitted 17 February, 2021; originally announced February 2021.

  26. arXiv:2101.10112  [pdf, other

    cs.CY cs.CL

    Fringe News Networks: Dynamics of US News Viewership following the 2020 Presidential Election

    Authors: Ashiqur R. KhudaBukhsh, Rupak Sarkar, Mark S. Kamlet, Tom M. Mitchell

    Abstract: The growing political polarization of the American electorate over the last several decades has been widely studied and documented. During the administration of President Donald Trump, charges of "fake news" made social and news media not only the means but, to an unprecedented extent, the topic of political communication. Using data from before the November 3rd, 2020 US Presidential election, rec… ▽ More

    Submitted 21 January, 2021; originally announced January 2021.

  27. arXiv:2011.10280  [pdf, ps, other

    cs.CL

    Are Chess Discussions Racist? An Adversarial Hate Speech Data Set

    Authors: Rupak Sarkar, Ashiqur R. KhudaBukhsh

    Abstract: On June 28, 2020, while presenting a chess podcast on Grandmaster Hikaru Nakamura, Antonio Radić's YouTube handle got blocked because it contained "harmful and dangerous" content. YouTube did not give further specific reason, and the channel got reinstated within 24 hours. However, Radić speculated that given the current political situation, a referral to "black against white", albeit in the conte… ▽ More

    Submitted 20 November, 2020; originally announced November 2020.

  28. arXiv:2010.02339  [pdf, ps, other

    cs.CL cs.CY

    We Don't Speak the Same Language: Interpreting Polarization through Machine Translation

    Authors: Ashiqur R. KhudaBukhsh, Rupak Sarkar, Mark S. Kamlet, Tom M. Mitchell

    Abstract: Polarization among US political parties, media and elites is a widely studied topic. Prominent lines of prior research across multiple disciplines have observed and analyzed growing polarization in social media. In this paper, we present a new methodology that offers a fresh perspective on interpreting polarization through the lens of machine translation. With a novel proposition that two sub-comm… ▽ More

    Submitted 18 October, 2020; v1 submitted 5 October, 2020; originally announced October 2020.

  29. arXiv:2008.13347  [pdf, other

    cs.CL cs.CY cs.LG

    Discovering Bilingual Lexicons in Polyglot Word Embeddings

    Authors: Ashiqur R. KhudaBukhsh, Shriphani Palakodety, Tom M. Mitchell

    Abstract: Bilingual lexicons and phrase tables are critical resources for modern Machine Translation systems. Although recent results show that without any seed lexicon or parallel data, highly accurate bilingual lexicons can be learned using unsupervised methods, such methods rely on the existence of large, clean monolingual corpora. In this work, we utilize a single Skip-gram model trained on a multilingu… ▽ More

    Submitted 30 August, 2020; originally announced August 2020.

  30. arXiv:2001.11258  [pdf, ps, other

    cs.CL cs.CY cs.LG

    Harnessing Code Switching to Transcend the Linguistic Barrier

    Authors: Ashiqur R. KhudaBukhsh, Shriphani Palakodety, Jaime G. Carbonell

    Abstract: Code mixing (or code switching) is a common phenomenon observed in social-media content generated by a linguistically diverse user-base. Studies show that in the Indian sub-continent, a substantial fraction of social media posts exhibit code switching. While the difficulties posed by code mixed documents to further downstream analyses are well-understood, lending visibility to code mixed documents… ▽ More

    Submitted 15 June, 2020; v1 submitted 30 January, 2020; originally announced January 2020.

  31. arXiv:2001.01697  [pdf, other

    cs.CY cs.LG

    Social Media Attributions in the Context of Water Crisis

    Authors: Rupak Sarkar, Hirak Sarkar, Sayantan Mahinder, Ashiqur R. KhudaBukhsh

    Abstract: Attribution of natural disasters/collective misfortune is a widely-studied political science problem. However, such studies are typically survey-centric or rely on a handful of experts to weigh in on the matter. In this paper, we explore how can we use social media data and an AI-driven approach to complement traditional surveys and automatically extract attribution factors. We focus on the most-r… ▽ More

    Submitted 6 January, 2020; originally announced January 2020.

  32. arXiv:1910.03206  [pdf, ps, other

    cs.CY cs.CL cs.IR cs.LG

    Voice for the Voiceless: Active Sampling to Detect Comments Supporting the Rohingyas

    Authors: Shriphani Palakodety, Ashiqur R. KhudaBukhsh, Jaime G. Carbonell

    Abstract: The Rohingya refugee crisis is one of the biggest humanitarian crises of modern times with more than 600,000 Rohingyas rendered homeless according to the United Nations High Commissioner for Refugees. While it has received sustained press attention globally, no comprehensive research has been performed on social media pertaining to this large evolving crisis. In this work, we construct a substanti… ▽ More

    Submitted 6 January, 2020; v1 submitted 8 October, 2019; originally announced October 2019.

  33. arXiv:1909.12940  [pdf, ps, other

    cs.CY cs.CL cs.LG

    Hope Speech Detection: A Computational Analysis of the Voice of Peace

    Authors: Shriphani Palakodety, Ashiqur R. KhudaBukhsh, Jaime G. Carbonell

    Abstract: The recent Pulwama terror attack (February 14, 2019, Pulwama, Kashmir) triggered a chain of escalating events between India and Pakistan adding another episode to their 70-year-old dispute over Kashmir. The present era of ubiquitious social media has never seen nuclear powers closer to war. In this paper, we analyze this evolving international crisis via a substantial corpus constructed using comm… ▽ More

    Submitted 24 February, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

    Comments: Minor edits