-
PANDORA Talks: Personality and Demographics on Reddit
Abstract: Personality and demographics are important variables in social sciences, while in NLP they can aid in interpretability and removal of societal biases. However, datasets with both personality and demographic labels are scarce. To address this, we present PANDORA, the first large-scale dataset of Reddit comments labeled with three personality models (including the well-established Big 5 model) and d… ▽ More
Submitted 8 June, 2021; v1 submitted 9 April, 2020; originally announced April 2020.
Comments: Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media, NAACL 2021, https://www.aclweb.org/anthology/2021.socialnlp-1.12
-
arXiv:1811.04655 [pdf, ps, other]
Not Just Depressed: Bipolar Disorder Prediction on Reddit
Abstract: Bipolar disorder, an illness characterized by manic and depressive episodes, affects more than 60 million people worldwide. We present a preliminary study on bipolar disorder prediction from user-generated text on Reddit, which relies on users' self-reported labels. Our benchmark classifiers for bipolar disorder prediction outperform the baselines and reach accuracy and F1-scores of above 86%. Fea… ▽ More
Submitted 27 March, 2019; v1 submitted 12 November, 2018; originally announced November 2018.
Comments: WASSA at EMNLP 2018
Journal ref: WASSA@EMNLP 2018: 72-78