-
A Survey on Multilingual Mental Disorders Detection from Social Media Data
Authors:
Ana-Maria Bucur,
Marcos Zampieri,
Tharindu Ranasinghe,
Fabio Crestani
Abstract:
The increasing prevalence of mental health disorders globally highlights the urgent need for effective digital screening methods that can be used in multilingual contexts. Most existing studies, however, focus on English data, overlooking critical mental health signals that may be present in non-English texts. To address this important gap, we present the first survey on the detection of mental he…
▽ More
The increasing prevalence of mental health disorders globally highlights the urgent need for effective digital screening methods that can be used in multilingual contexts. Most existing studies, however, focus on English data, overlooking critical mental health signals that may be present in non-English texts. To address this important gap, we present the first survey on the detection of mental health disorders using multilingual social media data. We investigate the cultural nuances that influence online language patterns and self-disclosure behaviors, and how these factors can impact the performance of NLP tools. Additionally, we provide a comprehensive list of multilingual data collections that can be used for developing NLP models for mental health screening. Our findings can inform the design of effective multilingual mental health screening tools that can meet the needs of diverse populations, ultimately improving mental health outcomes on a global scale.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
Datasets for Depression Modeling in Social Media: An Overview
Authors:
Ana-Maria Bucur,
Andreea-Codrina Moldovan,
Krutika Parvatikar,
Marcos Zampieri,
Ashiqur R. KhudaBukhsh,
Liviu P. Dinu
Abstract:
Depression is the most common mental health disorder, and its prevalence increased during the COVID-19 pandemic. As one of the most extensively researched psychological conditions, recent research has increasingly focused on leveraging social media data to enhance traditional methods of depression screening. This paper addresses the growing interest in interdisciplinary research on depression, and…
▽ More
Depression is the most common mental health disorder, and its prevalence increased during the COVID-19 pandemic. As one of the most extensively researched psychological conditions, recent research has increasingly focused on leveraging social media data to enhance traditional methods of depression screening. This paper addresses the growing interest in interdisciplinary research on depression, and aims to support early-career researchers by providing a comprehensive and up-to-date list of datasets for analyzing and predicting depression through social media data. We present an overview of datasets published between 2019 and 2024. We also make the comprehensive list of datasets available online as a continuously updated resource, with the hope that it will facilitate further interdisciplinary research into the linguistic expressions of depression on social media.
△ Less
Submitted 27 March, 2025;
originally announced March 2025.
-
BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages
Authors:
Shamsuddeen Hassan Muhammad,
Nedjma Ousidhoum,
Idris Abdulmumin,
Jan Philip Wahle,
Terry Ruas,
Meriem Beloucif,
Christine de Kock,
Nirmal Surange,
Daniela Teodorescu,
Ibrahim Said Ahmad,
David Ifeoluwa Adelani,
Alham Fikri Aji,
Felermino D. M. A. Ali,
Ilseyar Alimova,
Vladimir Araujo,
Nikolay Babakov,
Naomi Baes,
Ana-Maria Bucur,
Andiswa Bukula,
Guanqun Cao,
Rodrigo Tufino Cardenas,
Rendi Chevi,
Chiamaka Ijeoma Chukwuneke,
Alexandra Ciobotaru,
Daryna Dementieva
, et al. (23 additional authors not shown)
Abstract:
People worldwide use language in subtle and complex ways to express emotions. Although emotion recognition--an umbrella term for several NLP tasks--impacts various applications within NLP and beyond, most work in this area has focused on high-resource languages. This has led to significant disparities in research efforts and proposed solutions, particularly for under-resourced languages, which oft…
▽ More
People worldwide use language in subtle and complex ways to express emotions. Although emotion recognition--an umbrella term for several NLP tasks--impacts various applications within NLP and beyond, most work in this area has focused on high-resource languages. This has led to significant disparities in research efforts and proposed solutions, particularly for under-resourced languages, which often lack high-quality annotated datasets. In this paper, we present BRIGHTER--a collection of multi-labeled, emotion-annotated datasets in 28 different languages and across several domains. BRIGHTER primarily covers low-resource languages from Africa, Asia, Eastern Europe, and Latin America, with instances labeled by fluent speakers. We highlight the challenges related to the data collection and annotation processes, and then report experimental results for monolingual and crosslingual multi-label emotion identification, as well as emotion intensity recognition. We analyse the variability in performance across languages and text domains, both with and without the use of LLMs, and show that the BRIGHTER datasets represent a meaningful step towards addressing the gap in text-based emotion recognition.
△ Less
Submitted 29 May, 2025; v1 submitted 17 February, 2025;
originally announced February 2025.
-
Discretionary vs nondiscretionary in fiscal mechanism. Non-automatic fiscal stabilisers vs automatic fiscal stabilisers
Authors:
Vasile Bratian,
Amelia Bucur,
Camelia Oprean,
Cristina Tanasescu
Abstract:
The goal of the present study is to increase the intelligibility of macroeconomic phenomena triggered by governmental intervention in economy by means of fiscal policies. During cyclical movements, fiscal policy can play an important role in order to help stabilise the economy. But discretionary policy usually implies implementation lags and is not automatically reversed when economic conditions c…
▽ More
The goal of the present study is to increase the intelligibility of macroeconomic phenomena triggered by governmental intervention in economy by means of fiscal policies. During cyclical movements, fiscal policy can play an important role in order to help stabilise the economy. But discretionary policy usually implies implementation lags and is not automatically reversed when economic conditions change. In contrast, automatic fiscal stabilisers (SFA) ensure a prompter, and self-correcting fiscal response. The present study aims to tackle the topic of discretionary vs nondiscretionary characteristic of fiscal stabilisers (SF). In this context, the scope of the research undertaking is to launch a scientific debate over the definitions of the concepts of non-automatic fiscal stabilisers (SfnA) and SFAs. We describe how we can quantify the discretionary and non-discretionary character of the fiscal policy, by the analysis of the structure of the conventional budget balance (SBc), budget balance associated with the current GDP. In the final part of this article, we propose a quantitative equilibrium model for establishing the mathematical prerequisites for an SF to become automatic. Likewise, on the basis of the proposed mathematical model we have performed a qualitative analysis of the influence factors.
△ Less
Submitted 15 January, 2025;
originally announced January 2025.
-
An Approach on the Modelling of Long Economic Cycles in the Context of Sustainable Development
Authors:
Cristina Tanasescu,
Amelia Bucur,
Camelia Oprean-Stan
Abstract:
One of the themes that have been approached more and more within the specialised literature is being represented by economic cycles. The analysis of these is very useful in the long term predictions, in finding solutions for the economic raise and for detecting the economic crisis. At the same time, it is underlined in a lot of scientific and research papers, the importance of the sustainable deve…
▽ More
One of the themes that have been approached more and more within the specialised literature is being represented by economic cycles. The analysis of these is very useful in the long term predictions, in finding solutions for the economic raise and for detecting the economic crisis. At the same time, it is underlined in a lot of scientific and research papers, the importance of the sustainable development in the present and future society. In this paper we intend to bring contributions to the study of the cycles of a sustainable economy and we will analyse it having in mind the purpose of creating the sustainable economy. We will demonstrate the fact that curves that represent graphically all these, are not simple logistics anymore, bi-logistics or multilogistics curves, but curves in plan that are obtained by composing logistics functions with the function of the sustainable development or with the function that shapes the economic component of it mathematically. We will present an interpretation of mathematic models within the frame of the sustainable development.
△ Less
Submitted 14 January, 2025;
originally announced January 2025.
-
On the State of NLP Approaches to Modeling Depression in Social Media: A Post-COVID-19 Outlook
Authors:
Ana-Maria Bucur,
Andreea-Codrina Moldovan,
Krutika Parvatikar,
Marcos Zampieri,
Ashiqur R. KhudaBukhsh,
Liviu P. Dinu
Abstract:
Computational approaches to predicting mental health conditions in social media have been substantially explored in the past years. Multiple reviews have been published on this topic, providing the community with comprehensive accounts of the research in this area. Among all mental health conditions, depression is the most widely studied due to its worldwide prevalence. The COVID-19 global pandemi…
▽ More
Computational approaches to predicting mental health conditions in social media have been substantially explored in the past years. Multiple reviews have been published on this topic, providing the community with comprehensive accounts of the research in this area. Among all mental health conditions, depression is the most widely studied due to its worldwide prevalence. The COVID-19 global pandemic, starting in early 2020, has had a great impact on mental health worldwide. Harsh measures employed by governments to slow the spread of the virus (e.g., lockdowns) and the subsequent economic downturn experienced in many countries have significantly impacted people's lives and mental health. Studies have shown a substantial increase of above 50% in the rate of depression in the population. In this context, we present a review on natural language processing (NLP) approaches to modeling depression in social media, providing the reader with a post-COVID-19 outlook. This review contributes to the understanding of the impacts of the pandemic on modeling depression in social media. We outline how state-of-the-art approaches and new datasets have been used in the context of the COVID-19 pandemic. Finally, we also discuss ethical issues in collecting and processing mental health data, considering fairness, accountability, and ethics.
△ Less
Submitted 7 March, 2025; v1 submitted 11 October, 2024;
originally announced October 2024.
-
RoMath: A Mathematical Reasoning Benchmark in Romanian
Authors:
Adrian Cosma,
Ana-Maria Bucur,
Emilian Radoi
Abstract:
Mathematics has long been conveyed through natural language, primarily for human understanding. With the rise of mechanized mathematics and proof assistants, there is a growing need to understand informal mathematical text, yet most existing benchmarks focus solely on English, overlooking other languages. This paper introduces RoMath, a Romanian mathematical reasoning benchmark suite comprising th…
▽ More
Mathematics has long been conveyed through natural language, primarily for human understanding. With the rise of mechanized mathematics and proof assistants, there is a growing need to understand informal mathematical text, yet most existing benchmarks focus solely on English, overlooking other languages. This paper introduces RoMath, a Romanian mathematical reasoning benchmark suite comprising three subsets: Baccalaureate, Competitions and Synthetic, which cover a range of mathematical domains and difficulty levels, aiming to improve non-English language models and promote multilingual AI development. By focusing on Romanian, a low-resource language with unique linguistic features, RoMath addresses the limitations of Anglo-centric models and emphasizes the need for dedicated resources beyond simple automatic translation. We benchmark several open-weight language models, highlighting the importance of creating resources for underrepresented languages. Code and datasets are be made available.
△ Less
Submitted 20 May, 2025; v1 submitted 17 September, 2024;
originally announced September 2024.
-
Reading Between the Frames: Multi-Modal Depression Detection in Videos from Non-Verbal Cues
Authors:
David Gimeno-Gómez,
Ana-Maria Bucur,
Adrian Cosma,
Carlos-David Martínez-Hinarejos,
Paolo Rosso
Abstract:
Depression, a prominent contributor to global disability, affects a substantial portion of the population. Efforts to detect depression from social media texts have been prevalent, yet only a few works explored depression detection from user-generated video content. In this work, we address this research gap by proposing a simple and flexible multi-modal temporal model capable of discerning non-ve…
▽ More
Depression, a prominent contributor to global disability, affects a substantial portion of the population. Efforts to detect depression from social media texts have been prevalent, yet only a few works explored depression detection from user-generated video content. In this work, we address this research gap by proposing a simple and flexible multi-modal temporal model capable of discerning non-verbal depression cues from diverse modalities in noisy, real-world videos. We show that, for in-the-wild videos, using additional high-level non-verbal cues is crucial to achieving good performance, and we extracted and processed audio speech embeddings, face emotion embeddings, face, body and hand landmarks, and gaze and blinking information. Through extensive experiments, we show that our model achieves state-of-the-art results on three key benchmark datasets for depression detection from video by a substantial margin. Our code is publicly available on GitHub.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Frobenius sign separation for abelian varieties
Authors:
Alina Bucur,
Francesc Fité,
Kiran S. Kedlaya
Abstract:
Let A and A' be nonzero abelian varieties defined over a number field k such that Hom(A,A')=0. Under the Generalized Riemann hypothesis for motivic L-functions attached to A and A', we show that there exists a prime p of k of good reduction for A and A' at which the Frobenius traces of A and A' are nonzero and differ by sign, and such that the norm of p is O_{k,g,g'}(log(2NN')^2), where N and N' r…
▽ More
Let A and A' be nonzero abelian varieties defined over a number field k such that Hom(A,A')=0. Under the Generalized Riemann hypothesis for motivic L-functions attached to A and A', we show that there exists a prime p of k of good reduction for A and A' at which the Frobenius traces of A and A' are nonzero and differ by sign, and such that the norm of p is O_{k,g,g'}(log(2NN')^2), where N and N' respectively denote the absolute conductors of A and A'. We also make the dependence of the big-O constant on k and the dimensions g,g' of A,A' explicit up to an effectively computable absolute constant. Our method extends that of Chen, Park, and Swaminathan who considered the case in which A and A' are elliptic curves.
△ Less
Submitted 3 April, 2025; v1 submitted 16 October, 2023;
originally announced October 2023.
-
Automatic Extraction of the Romanian Academic Word List: Data and Methods
Authors:
Ana-Maria Bucur,
Andreea Dincă,
Mădălina Chitez,
Roxana Rogobete
Abstract:
This paper presents the methodology and data used for the automatic extraction of the Romanian Academic Word List (Ro-AWL). Academic Word Lists are useful in both L2 and L1 teaching contexts. For the Romanian language, no such resource exists so far. Ro-AWL has been generated by combining methods from corpus and computational linguistics with L2 academic writing approaches. We use two types of dat…
▽ More
This paper presents the methodology and data used for the automatic extraction of the Romanian Academic Word List (Ro-AWL). Academic Word Lists are useful in both L2 and L1 teaching contexts. For the Romanian language, no such resource exists so far. Ro-AWL has been generated by combining methods from corpus and computational linguistics with L2 academic writing approaches. We use two types of data: (a) existing data, such as the Romanian Frequency List based on the ROMBAC corpus, and (b) self-compiled data, such as the expert academic writing corpus EXPRES. For constructing the academic word list, we follow the methodology for building the Academic Vocabulary List for the English language. The distribution of Ro-AWL features (general distribution, POS distribution) into four disciplinary datasets is in line with previous research. Ro-AWL is freely available and can be used for teaching, research and NLP applications.
△ Less
Submitted 29 July, 2023;
originally announced July 2023.
-
Utilizing ChatGPT Generated Data to Retrieve Depression Symptoms from Social Media
Authors:
Ana-Maria Bucur
Abstract:
In this work, we present the contribution of the BLUE team in the eRisk Lab task on searching for symptoms of depression. The task consists of retrieving and ranking Reddit social media sentences that convey symptoms of depression from the BDI-II questionnaire. Given that synthetic data provided by LLMs have been proven to be a reliable method for augmenting data and fine-tuning downstream models,…
▽ More
In this work, we present the contribution of the BLUE team in the eRisk Lab task on searching for symptoms of depression. The task consists of retrieving and ranking Reddit social media sentences that convey symptoms of depression from the BDI-II questionnaire. Given that synthetic data provided by LLMs have been proven to be a reliable method for augmenting data and fine-tuning downstream models, we chose to generate synthetic data using ChatGPT for each of the symptoms of the BDI-II questionnaire. We designed a prompt such that the generated data contains more richness and semantic diversity than the BDI-II responses for each question and, at the same time, contains emotional and anecdotal experiences that are specific to the more intimate way of sharing experiences on Reddit. We perform semantic search and rank the sentences' relevance to the BDI-II symptoms by cosine similarity. We used two state-of-the-art transformer-based models (MentalRoBERTa and a variant of MPNet) for embedding the social media posts, the original and generated responses of the BDI-II. Our results show that using sentence embeddings from a model designed for semantic search outperforms the approach using embeddings from a model pre-trained on mental health data. Furthermore, the generated synthetic data were proved too specific for this task, the approach simply relying on the BDI-II responses had the best performance.
△ Less
Submitted 6 July, 2023; v1 submitted 5 July, 2023;
originally announced July 2023.
-
It's Just a Matter of Time: Detecting Depression with Time-Enriched Multimodal Transformers
Authors:
Ana-Maria Bucur,
Adrian Cosma,
Paolo Rosso,
Liviu P. Dinu
Abstract:
Depression detection from user-generated content on the internet has been a long-lasting topic of interest in the research community, providing valuable screening tools for psychologists. The ubiquitous use of social media platforms lays out the perfect avenue for exploring mental health manifestations in posts and interactions with other users. Current methods for depression detection from social…
▽ More
Depression detection from user-generated content on the internet has been a long-lasting topic of interest in the research community, providing valuable screening tools for psychologists. The ubiquitous use of social media platforms lays out the perfect avenue for exploring mental health manifestations in posts and interactions with other users. Current methods for depression detection from social media mainly focus on text processing, and only a few also utilize images posted by users. In this work, we propose a flexible time-enriched multimodal transformer architecture for detecting depression from social media posts, using pretrained models for extracting image and text embeddings. Our model operates directly at the user-level, and we enrich it with the relative time between posts by using time2vec positional embeddings. Moreover, we propose another model variant, which can operate on randomly sampled and unordered sets of posts to be more robust to dataset noise. We show that our method, using EmoBERTa and CLIP embeddings, surpasses other methods on two multimodal datasets, obtaining state-of-the-art results of 0.931 F1 score on a popular multimodal Twitter dataset, and 0.902 F1 score on the only multimodal Reddit dataset.
△ Less
Submitted 6 February, 2023; v1 submitted 13 January, 2023;
originally announced January 2023.
-
Power-saving error terms for the number of $D_4$-quartic extensions over a number field ordered by discriminant
Authors:
Alina Bucur,
Alexandra Florea,
Allechar Serrano López,
Ila Varma
Abstract:
We study the asymptotic count of dihedral quartic extensions over a fixed number field with bounded norm of the relative discriminant. The main term of this count (including a summation formula for the constant) can be found in the literature (see Cohen--Diaz y Diaz--Olivier for the statement without proof and see Klüners for a proof), but a power-saving for the error term has not been explicitly…
▽ More
We study the asymptotic count of dihedral quartic extensions over a fixed number field with bounded norm of the relative discriminant. The main term of this count (including a summation formula for the constant) can be found in the literature (see Cohen--Diaz y Diaz--Olivier for the statement without proof and see Klüners for a proof), but a power-saving for the error term has not been explicitly determined except in the case that the base field is $\mathbb{Q}$. In this article, we describe the argument for obtaining both the explicit main term and a power-saving error term for the number of $D_4$-quartic extensions over a general base number field ordered by the norms of their relative discriminants. We also give an extensive overview of the history and development of number field asymptotics.
△ Less
Submitted 27 September, 2022;
originally announced September 2022.
-
An End-to-End Set Transformer for User-Level Classification of Depression and Gambling Disorder
Authors:
Ana-Maria Bucur,
Adrian Cosma,
Liviu P. Dinu,
Paolo Rosso
Abstract:
This work proposes a transformer architecture for user-level classification of gambling addiction and depression that is trainable end-to-end. As opposed to other methods that operate at the post level, we process a set of social media posts from a particular individual, to make use of the interactions between posts and eliminate label noise at the post level. We exploit the fact that, by not inje…
▽ More
This work proposes a transformer architecture for user-level classification of gambling addiction and depression that is trainable end-to-end. As opposed to other methods that operate at the post level, we process a set of social media posts from a particular individual, to make use of the interactions between posts and eliminate label noise at the post level. We exploit the fact that, by not injecting positional encodings, multi-head attention is permutation invariant and we process randomly sampled sets of texts from a user after being encoded with a modern pretrained sentence encoder (RoBERTa / MiniLM). Moreover, our architecture is interpretable with modern feature attribution methods and allows for automatic dataset creation by identifying discriminating posts in a user's text-set. We perform ablation studies on hyper-parameters and evaluate our method for the eRisk 2022 Lab on early detection of signs of pathological gambling and early risk detection of depression. The method proposed by our team BLUE obtained the best ERDE5 score of 0.015, and the second-best ERDE50 score of 0.009 for pathological gambling detection. For the early detection of depression, we obtained the second-best ERDE50 of 0.027.
△ Less
Submitted 2 July, 2022;
originally announced July 2022.
-
Life is not Always Depressing: Exploring the Happy Moments of People Diagnosed with Depression
Authors:
Ana-Maria Bucur,
Adrian Cosma,
Liviu P. Dinu
Abstract:
In this work, we explore the relationship between depression and manifestations of happiness in social media. While the majority of works surrounding depression focus on symptoms, psychological research shows that there is a strong link between seeking happiness and being diagnosed with depression. We make use of Positive-Unlabeled learning paradigm to automatically extract happy moments from soci…
▽ More
In this work, we explore the relationship between depression and manifestations of happiness in social media. While the majority of works surrounding depression focus on symptoms, psychological research shows that there is a strong link between seeking happiness and being diagnosed with depression. We make use of Positive-Unlabeled learning paradigm to automatically extract happy moments from social media posts of both controls and users diagnosed with depression, and qualitatively analyze them with linguistic tools such as LIWC and keyness information. We show that the life of depressed individuals is not always bleak, with positive events related to friends and family being more noteworthy to their lives compared to the more mundane happy events reported by control users.
△ Less
Submitted 8 May, 2022; v1 submitted 28 April, 2022;
originally announced April 2022.
-
BLUE at Memotion 2.0 2022: You have my Image, my Text and my Transformer
Authors:
Ana-Maria Bucur,
Adrian Cosma,
Ioan-Bogdan Iordache
Abstract:
Memes are prevalent on the internet and continue to grow and evolve alongside our culture. An automatic understanding of memes propagating on the internet can shed light on the general sentiment and cultural attitudes of people. In this work, we present team BLUE's solution for the second edition of the MEMOTION shared task. We showcase two approaches for meme classification (i.e. sentiment, humou…
▽ More
Memes are prevalent on the internet and continue to grow and evolve alongside our culture. An automatic understanding of memes propagating on the internet can shed light on the general sentiment and cultural attitudes of people. In this work, we present team BLUE's solution for the second edition of the MEMOTION shared task. We showcase two approaches for meme classification (i.e. sentiment, humour, offensive, sarcasm and motivation levels) using a text-only method using BERT, and a Multi-Modal-Multi-Task transformer network that operates on both the meme image and its caption to output the final scores. In both approaches, we leverage state-of-the-art pretrained models for text (BERT, Sentence Transformer) and image processing (EfficientNetV4, CLIP). Through our efforts, we obtain first place in task A, second place in task B and third place in task C. In addition, our team obtained the highest average score for all three tasks.
△ Less
Submitted 4 April, 2022; v1 submitted 15 February, 2022;
originally announced February 2022.
-
Sequence-to-Sequence Lexical Normalization with Multilingual Transformers
Authors:
Ana-Maria Bucur,
Adrian Cosma,
Liviu P. Dinu
Abstract:
Current benchmark tasks for natural language processing contain text that is qualitatively different from the text used in informal day to day digital communication. This discrepancy has led to severe performance degradation of state-of-the-art NLP models when fine-tuned on real-world data. One way to resolve this issue is through lexical normalization, which is the process of transforming non-sta…
▽ More
Current benchmark tasks for natural language processing contain text that is qualitatively different from the text used in informal day to day digital communication. This discrepancy has led to severe performance degradation of state-of-the-art NLP models when fine-tuned on real-world data. One way to resolve this issue is through lexical normalization, which is the process of transforming non-standard text, usually from social media, into a more standardized form. In this work, we propose a sentence-level sequence-to-sequence model based on mBART, which frames the problem as a machine translation problem. As the noisy text is a pervasive problem across languages, not just English, we leverage the multi-lingual pre-training of mBART to fine-tune it to our data. While current approaches mainly operate at the word or subword level, we argue that this approach is straightforward from a technical standpoint and builds upon existing pre-trained transformer networks. Our results show that while word-level, intrinsic, performance evaluation is behind other methods, our model improves performance on extrinsic, downstream tasks through normalization compared to models operating on raw, unprocessed, social media text.
△ Less
Submitted 12 October, 2021; v1 submitted 6 October, 2021;
originally announced October 2021.
-
Geometric generalizations of the square sieve, with an application to cyclic covers
Authors:
Alina Bucur,
Alina Carmen Cojocaru,
Matilde N. Lalín,
Lillian B. Pierce
Abstract:
We formulate a general problem: given projective schemes $\mathbb{Y}$ and $\mathbb{X}$ over a global field $K$ and a $K$-morphism $η$ from $\mathbb{Y}$ to $\mathbb{X}$ of finite degree, how many points in $\mathbb{X}(K)$ of height at most $B$ have a pre-image under $η$ in $\mathbb{Y}(K)$? This problem is inspired by a well-known conjecture of Serre on quantitative upper bounds for the number of po…
▽ More
We formulate a general problem: given projective schemes $\mathbb{Y}$ and $\mathbb{X}$ over a global field $K$ and a $K$-morphism $η$ from $\mathbb{Y}$ to $\mathbb{X}$ of finite degree, how many points in $\mathbb{X}(K)$ of height at most $B$ have a pre-image under $η$ in $\mathbb{Y}(K)$? This problem is inspired by a well-known conjecture of Serre on quantitative upper bounds for the number of points of bounded height on an irreducible projective variety defined over a number field. We give a non-trivial answer to the general problem when $K=\mathbb{F}_q(T)$ and $\mathbb{Y}$ is a prime degree cyclic cover of $\mathbb{X}=\mathbb{P}_{K}^n$. Our tool is a new geometric sieve, which generalizes the polynomial sieve to a geometric setting over global function fields.
△ Less
Submitted 22 August, 2022; v1 submitted 23 September, 2021;
originally announced September 2021.
-
A Psychologically Informed Part-of-Speech Analysis of Depression in Social Media
Authors:
Ana-Maria Bucur,
Ioana R. Podină,
Liviu P. Dinu
Abstract:
In this work, we provide an extensive part-of-speech analysis of the discourse of social media users with depression. Research in psychology revealed that depressed users tend to be self-focused, more preoccupied with themselves and ruminate more about their lives and emotions. Our work aims to make use of large-scale datasets and computational methods for a quantitative exploration of discourse.…
▽ More
In this work, we provide an extensive part-of-speech analysis of the discourse of social media users with depression. Research in psychology revealed that depressed users tend to be self-focused, more preoccupied with themselves and ruminate more about their lives and emotions. Our work aims to make use of large-scale datasets and computational methods for a quantitative exploration of discourse. We use the publicly available depression dataset from the Early Risk Prediction on the Internet Workshop (eRisk) 2018 and extract part-of-speech features and several indices based on them. Our results reveal statistically significant differences between the depressed and non-depressed individuals confirming findings from the existing psychology literature. Our work provides insights regarding the way in which depressed individuals are expressing themselves on social media platforms, allowing for better-informed computational models to help monitor and prevent mental illnesses.
△ Less
Submitted 31 July, 2021;
originally announced August 2021.
-
Early Risk Detection of Pathological Gambling, Self-Harm and Depression Using BERT
Authors:
Ana-Maria Bucur,
Adrian Cosma,
Liviu P. Dinu
Abstract:
Early risk detection of mental illnesses has a massive positive impact upon the well-being of people. The eRisk workshop has been at the forefront of enabling interdisciplinary research in developing computational methods to automatically estimate early risk factors for mental issues such as depression, self-harm, anorexia and pathological gambling. In this paper, we present the contributions of t…
▽ More
Early risk detection of mental illnesses has a massive positive impact upon the well-being of people. The eRisk workshop has been at the forefront of enabling interdisciplinary research in developing computational methods to automatically estimate early risk factors for mental issues such as depression, self-harm, anorexia and pathological gambling. In this paper, we present the contributions of the BLUE team in the 2021 edition of the workshop, in which we tackle the problems of early detection of gambling addiction, self-harm and estimating depression severity from social media posts. We employ pre-trained BERT transformers and data crawled automatically from mental health subreddits and obtain reasonable results on all three tasks.
△ Less
Submitted 30 June, 2021;
originally announced June 2021.
-
An Exploratory Analysis of the Relation Between Offensive Language and Mental Health
Authors:
Ana-Maria Bucur,
Marcos Zampieri,
Liviu P. Dinu
Abstract:
In this paper, we analyze the interplay between the use of offensive language and mental health. We acquired publicly available datasets created for offensive language identification and depression detection and we train computational models to compare the use of offensive language in social media posts written by groups of individuals with and without self-reported depression diagnosis. We also l…
▽ More
In this paper, we analyze the interplay between the use of offensive language and mental health. We acquired publicly available datasets created for offensive language identification and depression detection and we train computational models to compare the use of offensive language in social media posts written by groups of individuals with and without self-reported depression diagnosis. We also look at samples written by groups of individuals whose posts show signs of depression according to recent related studies. Our analysis indicates that offensive language is more frequently used in the samples written by individuals with self-reported depression as well as individuals showing signs of depression. The results discussed here open new avenues in research in politeness/offensiveness and mental health.
△ Less
Submitted 24 June, 2021; v1 submitted 31 May, 2021;
originally announced May 2021.
-
Detecting Early Onset of Depression from Social Media Text using Learned Confidence Scores
Authors:
Ana-Maria Bucur,
Liviu P. Dinu
Abstract:
Computational research on mental health disorders from written texts covers an interdisciplinary area between natural language processing and psychology. A crucial aspect of this problem is prevention and early diagnosis, as suicide resulted from depression being the second leading cause of death for young adults. In this work, we focus on methods for detecting the early onset of depression from s…
▽ More
Computational research on mental health disorders from written texts covers an interdisciplinary area between natural language processing and psychology. A crucial aspect of this problem is prevention and early diagnosis, as suicide resulted from depression being the second leading cause of death for young adults. In this work, we focus on methods for detecting the early onset of depression from social media texts, in particular from Reddit. To that end, we explore the eRisk 2018 dataset and achieve good results with regard to the state of the art by leveraging topic analysis and learned confidence scores to guide the decision process.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
Effective Sato-Tate conjecture for abelian varieties and applications
Authors:
Alina Bucur,
Francesc Fité,
Kiran S. Kedlaya
Abstract:
From the generalized Riemann hypothesis for motivic L-functions, we derive an effective version of the Sato-Tate conjecture for an abelian variety A defined over a number field k with connected Sato-Tate group. By effective we mean that we give an upper bound on the error term in the count predicted by the Sato-Tate measure that only depends on certain invariants of A. We discuss three application…
▽ More
From the generalized Riemann hypothesis for motivic L-functions, we derive an effective version of the Sato-Tate conjecture for an abelian variety A defined over a number field k with connected Sato-Tate group. By effective we mean that we give an upper bound on the error term in the count predicted by the Sato-Tate measure that only depends on certain invariants of A. We discuss three applications of this conditional result. First, for an abelian variety defined over k, we consider a variant of Linnik's problem for abelian varieties that asks for an upper bound on the least norm of a prime whose normalized Frobenius trace lies in a given interval. Second, for an elliptic curve defined over k with complex multiplication, we determine (up to multiplication by a nonzero constant) the asymptotic number of primes whose Frobenius trace attain the integral part of the Hasse-Weil bound. Third, for a pair of abelian varieties defined over k with no common factors up to k-isogeny, we find an upper bound on the least norm of a prime at which the respective Frobenius traces have opposite sign.
△ Less
Submitted 13 October, 2023; v1 submitted 20 February, 2020;
originally announced February 2020.
-
A Visual Representation of Wittgenstein's Tractatus Logico-Philosophicus
Authors:
Anca Bucur,
Sergiu Nisioi
Abstract:
In this paper we present a data visualization method together with its potential usefulness in digital humanities and philosophy of language. We compile a multilingual parallel corpus from different versions of Wittgenstein's Tractatus Logico-Philosophicus, including the original in German and translations into English, Spanish, French, and Russian. Using this corpus, we compute a similarity measu…
▽ More
In this paper we present a data visualization method together with its potential usefulness in digital humanities and philosophy of language. We compile a multilingual parallel corpus from different versions of Wittgenstein's Tractatus Logico-Philosophicus, including the original in German and translations into English, Spanish, French, and Russian. Using this corpus, we compute a similarity measure between propositions and render a visual network of relations for different languages.
△ Less
Submitted 13 March, 2017;
originally announced March 2017.
-
Traces, high powers and one level density for families of curves over finite fields
Authors:
Alina Bucur,
Edgar Costa,
Chantal David,
João Guerreiro,
David Lowry-Duda
Abstract:
The zeta function of a curve $C$ over a finite field may be expressed in terms of the characteristic polynomial of a unitary matrix $Θ_C$.
We develop and present a new technique to compute the expected value of $\mathrm{Tr}(Θ_C^n)$ for various moduli spaces of curves of genus $g$ over a fixed finite field in the limit as $g$ is large, generalizing and extending the work of Rudnick and Chinis.…
▽ More
The zeta function of a curve $C$ over a finite field may be expressed in terms of the characteristic polynomial of a unitary matrix $Θ_C$.
We develop and present a new technique to compute the expected value of $\mathrm{Tr}(Θ_C^n)$ for various moduli spaces of curves of genus $g$ over a fixed finite field in the limit as $g$ is large, generalizing and extending the work of Rudnick and Chinis.
This is achieved by using function field zeta functions, explicit formulae, and the densities of prime polynomials with prescribed ramification types at certain places as given by Bucur, David, Feigon, Kaplan, Lalín and Wood [BDF$^+$16] and by Zhao.
We extend [BDF$^+$16] by describing explicit dependence on the place and give an explicit proof of the Lindelöf bound for function field Dirichlet $L$-functions $L(1/2 + it, χ)$.
As applications, we compute the one-level density for hyperelliptic curves, cyclic $\ell$-covers, and cubic non-Galois covers.
△ Less
Submitted 1 October, 2016;
originally announced October 2016.
-
The distribution of $\mathbb{F}_q$-points on cyclic $\ell$-covers of genus $g$
Authors:
Alina Bucur,
Chantal David,
Brooke Feigon,
Nathan Kaplan,
Matilde Lalín,
Ekin Ozman,
Melanie Matchett Wood
Abstract:
We study fluctuations in the number of points of $\ell$-cyclic covers of the projective line over the finite field $\mathbb{F}_q$ when $q \equiv 1 \mod \ell$ is fixed and the genus tends to infinity. The distribution is given as a sum of $q+1$ i.i.d. random variables. This was settled for hyperelliptic curves by Kurlberg and Rudnick, while statistics were obtained for certain components of the mod…
▽ More
We study fluctuations in the number of points of $\ell$-cyclic covers of the projective line over the finite field $\mathbb{F}_q$ when $q \equiv 1 \mod \ell$ is fixed and the genus tends to infinity. The distribution is given as a sum of $q+1$ i.i.d. random variables. This was settled for hyperelliptic curves by Kurlberg and Rudnick, while statistics were obtained for certain components of the moduli space of $\ell$-cyclic covers by Bucur, David, Feigon and Lalín. In this paper, we obtain statistics for the distribution of the number of points as the covers vary over the full moduli space of $\ell$-cyclic covers of genus $g$. This is achieved by relating $\ell$-covers to cyclic function field extensions, and counting such extensions with prescribed ramification and splitting conditions at a finite number of primes.
△ Less
Submitted 26 May, 2015;
originally announced May 2015.
-
Statistics for biquadratic covers of the projective line over finite fields
Authors:
Elisa Lorenzo,
Giulio Meleleo,
Piermarco Milione,
Alina Bucur
Abstract:
We study the distribution of the traces of the Frobenius endomorphism of genus $g$ curves which are quartic non-cyclic covers of $\mathbb{P}^{1}_{\mathbb{F}_{q}}$, as the curve varies in an irreducible component of the moduli space. We show that for $q$ fixed, the limiting distribution of the trace of Frobenius equals the sum of $q + 1$ independent random discrete variables. We also show that when…
▽ More
We study the distribution of the traces of the Frobenius endomorphism of genus $g$ curves which are quartic non-cyclic covers of $\mathbb{P}^{1}_{\mathbb{F}_{q}}$, as the curve varies in an irreducible component of the moduli space. We show that for $q$ fixed, the limiting distribution of the trace of Frobenius equals the sum of $q + 1$ independent random discrete variables. We also show that when both $g$ and $q$ go to infinity, the normalized trace has a standard complex Gaussian distribution. Finally, we extend these computations to the general case of arbitrary covers of $\mathbb{P}^{1}_{\mathbb{F}_{q}}$ with Galois group isomorphic to $r$ copies of $\mathbb{Z}/2\mathbb{Z}$. For $r = 1$, we recover the already known hyperelliptic case. We also include an appendix by Alina Bucur giving the heuristic of these distributions.
△ Less
Submitted 20 October, 2015; v1 submitted 11 March, 2015;
originally announced March 2015.
-
Statistics for ordinary Artin-Schreier covers and other $p$-rank strata
Authors:
Alina Bucur,
Chantal David,
Brooke Feigon,
Matilde Lalin
Abstract:
We study the distribution of the number of points and of the zeroes of the zeta function in different $p$-rank strata of Artin-Schreier covers over $\F_q$ when $q$ is fixed and the genus goes to infinity. The $p$-rank strata considered include the ordinary family, the whole family, and the family of curves with $p$-rank equal to $p-1.$ While the zeta zeroes always approach the standard Gaussian di…
▽ More
We study the distribution of the number of points and of the zeroes of the zeta function in different $p$-rank strata of Artin-Schreier covers over $\F_q$ when $q$ is fixed and the genus goes to infinity. The $p$-rank strata considered include the ordinary family, the whole family, and the family of curves with $p$-rank equal to $p-1.$ While the zeta zeroes always approach the standard Gaussian distribution, the number of points over $\F_q$ has a distribution that varies with the specific family.
△ Less
Submitted 30 April, 2013;
originally announced April 2013.
-
An application of the effective Sato-Tate conjecture
Authors:
Alina Bucur,
Kiran S. Kedlaya
Abstract:
Based on the Lagarias-Odlyzko effectivization of the Chebotarev density theorem, Kumar Murty gave an effective version of the Sato-Tate conjecture for an elliptic curve conditional on analytic continuation and Riemann hypothesis for the symmetric power $L$-functions. We use Murty's analysis to give a similar conditional effectivization of the generalized Sato-Tate conjecture for an arbitrary motiv…
▽ More
Based on the Lagarias-Odlyzko effectivization of the Chebotarev density theorem, Kumar Murty gave an effective version of the Sato-Tate conjecture for an elliptic curve conditional on analytic continuation and Riemann hypothesis for the symmetric power $L$-functions. We use Murty's analysis to give a similar conditional effectivization of the generalized Sato-Tate conjecture for an arbitrary motive. As an application, we give a conditional upper bound of the form $O((\log N)^2 (\log \log 2N)^2)$ for the smallest prime at which two given rational elliptic curves with conductor at most $N$ have Frobenius traces of opposite sign.
△ Less
Submitted 7 June, 2015; v1 submitted 1 January, 2013;
originally announced January 2013.
-
Distribution of zeta zeroes of Artin--Schreier curves
Authors:
Alina Bucur,
Chantal David,
Brooke Feigon,
Matilde Lalin,
Kaneenika Sinha
Abstract:
We study the distribution of the zeroes of the zeta functions of the family of Artin-Schreier covers of the projective line over $\mathbb{F}_q$ when $q$ is fixed and the genus goes to infinity. We consider both the global and the mesoscopic regimes, proving that when the genus goes to infinity, the number of zeroes with angles in a prescribed non-trivial subinterval of $[-π,π)$ has a standard Gaus…
▽ More
We study the distribution of the zeroes of the zeta functions of the family of Artin-Schreier covers of the projective line over $\mathbb{F}_q$ when $q$ is fixed and the genus goes to infinity. We consider both the global and the mesoscopic regimes, proving that when the genus goes to infinity, the number of zeroes with angles in a prescribed non-trivial subinterval of $[-π,π)$ has a standard Gaussian distribution (when properly normalized).
△ Less
Submitted 29 December, 2012; v1 submitted 20 November, 2011;
originally announced November 2011.
-
The probability that a complete intersection is smooth
Authors:
Alina Bucur,
Kiran S. Kedlaya
Abstract:
Given a smooth subscheme of a projective space over a finite field, we compute the probability that its intersection with a fixed number of hypersurface sections of large degree is smooth of the expected dimension. This generalizes the case of a single hypersurface, due to Poonen. We use this result to give a probabilistic model for the number of rational points of such a complete intersection. A…
▽ More
Given a smooth subscheme of a projective space over a finite field, we compute the probability that its intersection with a fixed number of hypersurface sections of large degree is smooth of the expected dimension. This generalizes the case of a single hypersurface, due to Poonen. We use this result to give a probabilistic model for the number of rational points of such a complete intersection. A somewhat surprising corollary is that the number of rational points on a random smooth intersection of two surfaces in projective 3-space is strictly less than the number of points on the projective line.
△ Less
Submitted 15 October, 2012; v1 submitted 26 March, 2010;
originally announced March 2010.
-
The fluctuations in the number of points of smooth plane curves over finite fields
Authors:
Alina Bucur,
Chantal David,
Brooke Feigon,
Matilde Lalín
Abstract:
In this note, we study the fluctuations in the number of points of smooth projective plane curves over finite fields $\mathbb{F}_q$ as $q$ is fixed and the genus varies. More precisely, we show that these fluctuations are predicted by a natural probabilistic model, in which the points of the projective plane impose independent conditions on the curve. The main tool we use is a geometric sieving…
▽ More
In this note, we study the fluctuations in the number of points of smooth projective plane curves over finite fields $\mathbb{F}_q$ as $q$ is fixed and the genus varies. More precisely, we show that these fluctuations are predicted by a natural probabilistic model, in which the points of the projective plane impose independent conditions on the curve. The main tool we use is a geometric sieving process introduced by Poonen.
△ Less
Submitted 23 December, 2009;
originally announced December 2009.
-
Statistics for traces of cyclic trigonal curves over finite fields
Authors:
Alina Bucur,
Chantal David,
Brooke Feigon,
Matilde Lalín
Abstract:
We study the variation of the trace of the Frobenius endomorphism associated to a cyclic trigonal curve of genus g over a field of q elements as the curve varies in an irreducible component of the moduli space. We show that for q fixed and g increasing, the limiting distribution of the trace of the Frobenius equals the sum of q+1 independent random variables taking the value 0 with probability 2…
▽ More
We study the variation of the trace of the Frobenius endomorphism associated to a cyclic trigonal curve of genus g over a field of q elements as the curve varies in an irreducible component of the moduli space. We show that for q fixed and g increasing, the limiting distribution of the trace of the Frobenius equals the sum of q+1 independent random variables taking the value 0 with probability 2/(q+2) and 1, e^{(2pi i)/3}, e^{(4pi i)/3} each with probability q/(3(q+2)). This extends the work of Kurlberg and Rudnick who considered the same limit for hyperelliptic curves. We also show that when both g and q go to infinity, the normalized trace has a standard complex Gaussian distribution and how to generalize these results to p-fold covers of the projective line.
△ Less
Submitted 11 September, 2009; v1 submitted 30 July, 2009;
originally announced July 2009.