-
DECASTE: Unveiling Caste Stereotypes in Large Language Models through Multi-Dimensional Bias Analysis
Authors:
Prashanth Vijayaraghavan,
Soroush Vosoughi,
Lamogha Chiazor,
Raya Horesh,
Rogerio Abreu de Paula,
Ehsan Degan,
Vandana Mukherjee
Abstract:
Recent advancements in large language models (LLMs) have revolutionized natural language processing (NLP) and expanded their applications across diverse domains. However, despite their impressive capabilities, LLMs have been shown to reflect and perpetuate harmful societal biases, including those based on ethnicity, gender, and religion. A critical and underexplored issue is the reinforcement of c…
▽ More
Recent advancements in large language models (LLMs) have revolutionized natural language processing (NLP) and expanded their applications across diverse domains. However, despite their impressive capabilities, LLMs have been shown to reflect and perpetuate harmful societal biases, including those based on ethnicity, gender, and religion. A critical and underexplored issue is the reinforcement of caste-based biases, particularly towards India's marginalized caste groups such as Dalits and Shudras. In this paper, we address this gap by proposing DECASTE, a novel, multi-dimensional framework designed to detect and assess both implicit and explicit caste biases in LLMs. Our approach evaluates caste fairness across four dimensions: socio-cultural, economic, educational, and political, using a range of customized prompting strategies. By benchmarking several state-of-the-art LLMs, we reveal that these models systematically reinforce caste biases, with significant disparities observed in the treatment of oppressed versus dominant caste groups. For example, bias scores are notably elevated when comparing Dalits and Shudras with dominant caste groups, reflecting societal prejudices that persist in model outputs. These results expose the subtle yet pervasive caste biases in LLMs and emphasize the need for more comprehensive and inclusive bias evaluation methodologies that assess the potential risks of deploying such models in real-world contexts.
△ Less
Submitted 4 June, 2025; v1 submitted 20 May, 2025;
originally announced May 2025.
-
CultureBank: An Online Community-Driven Knowledge Base Towards Culturally Aware Language Technologies
Authors:
Weiyan Shi,
Ryan Li,
Yutong Zhang,
Caleb Ziems,
Chunhua yu,
Raya Horesh,
Rogério Abreu de Paula,
Diyi Yang
Abstract:
To enhance language models' cultural awareness, we design a generalizable pipeline to construct cultural knowledge bases from different online communities on a massive scale. With the pipeline, we construct CultureBank, a knowledge base built upon users' self-narratives with 12K cultural descriptors sourced from TikTok and 11K from Reddit. Unlike previous cultural knowledge resources, CultureBank…
▽ More
To enhance language models' cultural awareness, we design a generalizable pipeline to construct cultural knowledge bases from different online communities on a massive scale. With the pipeline, we construct CultureBank, a knowledge base built upon users' self-narratives with 12K cultural descriptors sourced from TikTok and 11K from Reddit. Unlike previous cultural knowledge resources, CultureBank contains diverse views on cultural descriptors to allow flexible interpretation of cultural knowledge, and contextualized cultural scenarios to help grounded evaluation. With CultureBank, we evaluate different LLMs' cultural awareness, and identify areas for improvement. We also fine-tune a language model on CultureBank: experiments show that it achieves better performances on two downstream cultural tasks in a zero-shot setting. Finally, we offer recommendations based on our findings for future culturally aware language technologies. The project page is https://culturebank.github.io . The code and model is at https://github.com/SALT-NLP/CultureBank . The released CultureBank dataset is at https://huggingface.co/datasets/SALT-NLP/CultureBank .
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations
Authors:
Swapnaja Achintalwar,
Adriana Alvarado Garcia,
Ateret Anaby-Tavor,
Ioana Baldini,
Sara E. Berger,
Bishwaranjan Bhattacharjee,
Djallel Bouneffouf,
Subhajit Chaudhury,
Pin-Yu Chen,
Lamogha Chiazor,
Elizabeth M. Daly,
Kirushikesh DB,
Rogério Abreu de Paula,
Pierre Dognin,
Eitan Farchi,
Soumya Ghosh,
Michael Hind,
Raya Horesh,
George Kour,
Ja Young Lee,
Nishtha Madaan,
Sameep Mehta,
Erik Miehling,
Keerthiram Murugesan,
Manish Nagireddy
, et al. (13 additional authors not shown)
Abstract:
Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations. Due to several limiting factors surrounding LLMs (training cost, API access, data availability, etc.), it may not always be feasible to impose direct safety constraints on a deployed model. Therefore, an efficient and reliable alternative is required. To this end, we presen…
▽ More
Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations. Due to several limiting factors surrounding LLMs (training cost, API access, data availability, etc.), it may not always be feasible to impose direct safety constraints on a deployed model. Therefore, an efficient and reliable alternative is required. To this end, we present our ongoing efforts to create and deploy a library of detectors: compact and easy-to-build classification models that provide labels for various harms. In addition to the detectors themselves, we discuss a wide range of uses for these detector models - from acting as guardrails to enabling effective AI governance. We also deep dive into inherent challenges in their development and discuss future work aimed at making the detectors more reliable and broadening their scope.
△ Less
Submitted 19 August, 2024; v1 submitted 9 March, 2024;
originally announced March 2024.