Showing 1–1 of 1 results for author: Peperzak, J

Search v0.5.6 released 2020-02-24

arXiv:2208.01341 [pdf, other]

cs.CL cs.CY

Gender bias in (non)-contextual clinical word embeddings for stereotypical medical categories

Authors: Gizem Sogancioglu, Fabian Mijsters, Amar van Uden, Jelle Peperzak

Abstract: Clinical word embeddings are extensively used in various Bio-NLP problems as a state-of-the-art feature vector representation. Although they are quite successful at the semantic representation of words, due to the dataset - which potentially carries statistical and societal bias - on which they are trained, they might exhibit gender stereotypes. This study analyses gender bias of clinical embeddin… ▽ More Clinical word embeddings are extensively used in various Bio-NLP problems as a state-of-the-art feature vector representation. Although they are quite successful at the semantic representation of words, due to the dataset - which potentially carries statistical and societal bias - on which they are trained, they might exhibit gender stereotypes. This study analyses gender bias of clinical embeddings on three medical categories: mental disorders, sexually transmitted diseases, and personality traits. To this extent, we analyze two different pre-trained embeddings namely (contextualized) clinical-BERT and (non-contextualized) BioWordVec. We show that both embeddings are biased towards sensitive gender groups but BioWordVec exhibits a higher bias than clinical-BERT for all three categories. Moreover, our analyses show that clinical embeddings carry a high degree of bias for some medical terms and diseases which is conflicting with medical literature. Having such an ill-founded relationship might cause harm in downstream applications that use clinical embeddings. △ Less

Submitted 8 August, 2022; v1 submitted 2 August, 2022; originally announced August 2022.

Search v0.5.6 released 2020-02-24