Skip to main content

Showing 1–1 of 1 results for author: Lohanimit, R

.
  1. arXiv:2412.01711  [pdf, other

    cs.CL

    Towards Resource Efficient and Interpretable Bias Mitigation in Large Language Models

    Authors: Schrasing Tong, Eliott Zemour, Rawisara Lohanimit, Lalana Kagal

    Abstract: Although large language models (LLMs) have demonstrated their effectiveness in a wide range of applications, they have also been observed to perpetuate unwanted biases present in the training data, potentially leading to harm for marginalized communities. In this paper, we mitigate bias by leveraging small biased and anti-biased expert models to obtain a debiasing signal that will be added to the… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

    Comments: 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Safe Generative AI Workshop