Search | arXiv e-print repository

Bayesian Dynamic Clustering Factor Models

Authors: Tsering Dolkar, Marco A. R. Ferreira, Hwasoo Shin, Allison N. Tegge

Abstract: We propose novel Bayesian Dynamic Clustering Factor Models (BDCFM) for the analysis of multivariate longitudinal data. BDCFM combines factor models with hidden Markov models to concomitantly perform dimension reduction, clustering, and estimation of the dynamic transitions of subjects through clusters. We develop an efficient Gibbs sampler for exploration of the posterior distribution. An analysis… ▽ More We propose novel Bayesian Dynamic Clustering Factor Models (BDCFM) for the analysis of multivariate longitudinal data. BDCFM combines factor models with hidden Markov models to concomitantly perform dimension reduction, clustering, and estimation of the dynamic transitions of subjects through clusters. We develop an efficient Gibbs sampler for exploration of the posterior distribution. An analysis of a simulated dataset shows that our inferential approach works well both at parameter estimation and clustering of subjects. Finally, we illustrate the utility of our BDCFM with an analysis of a dataset on opioid use disorder. △ Less

Submitted 27 May, 2025; originally announced May 2025.

arXiv:2505.05280 [pdf, other]

Bayesian Clustering Factor Models

Authors: Hwasoo Shin, Marco A. R. Ferreira, Allison N. Tegge

Abstract: We present a novel framework for concomitant dimension reduction and clustering. This framework is based on a novel class of Bayesian clustering factor models. These models assume a factor model structure where the vectors of common factors follow a mixture of Gaussian distributions. We develop a Gibbs sampler to explore the posterior distribution and propose an information criterion to select the… ▽ More We present a novel framework for concomitant dimension reduction and clustering. This framework is based on a novel class of Bayesian clustering factor models. These models assume a factor model structure where the vectors of common factors follow a mixture of Gaussian distributions. We develop a Gibbs sampler to explore the posterior distribution and propose an information criterion to select the number of clusters and the number of factors. Simulation studies show that our inferential approach appropriately quantifies uncertainty. In addition, when compared to a previously published competitor method, our information criterion has favorable performance in terms of correct selection of number of clusters and number of factors. Finally, we illustrate the capabilities of our framework with an application to data on recovery from opioid use disorder where clustering of individuals may facilitate personalized health care. △ Less

Submitted 8 May, 2025; originally announced May 2025.

arXiv:2410.17376 [pdf, other]

Lorentz-violating Yukawa theory at finite temperature

Authors: D. S. Cabral, L. A. S. Evangelista, J. C. R. de Souza, L. H. A. R. Ferreira, A. F. Santos

Abstract: This paper addresses Yukawa theory, focusing on the scattering between two identical fermions mediated by an intermediate scalar boson, considering the effects of thermal contributions and Lorentz symmetry breaking. Temperature is introduced into the theory through the TFD formalism, while Lorentz violation arises from a background tensor coupled to the kinetic part of the Klein-Gordon Lagrangian.… ▽ More This paper addresses Yukawa theory, focusing on the scattering between two identical fermions mediated by an intermediate scalar boson, considering the effects of thermal contributions and Lorentz symmetry breaking. Temperature is introduced into the theory through the TFD formalism, while Lorentz violation arises from a background tensor coupled to the kinetic part of the Klein-Gordon Lagrangian. Two important quantities are calculated: the cross-section for the scattering process and the modified Yukawa potential. The main results obtained in this work demonstrate that considering Lorentz symmetry breaking has several implications for changes in symmetries and physical states, while the presence of temperature is strongly related to the strength of the interaction. This interplay between symmetry breaking and temperature effects provides deeper insights into the behavior of the Yukawa theory under different conditions. △ Less

Submitted 22 October, 2024; originally announced October 2024.

Comments: 20 pages, 4 figures

arXiv:2406.13574 [pdf, other]

Scalar sector of the Myers and Pospelov model: thermal and size effects

Authors: L. H. A. R. Ferreira, A. F. Santos, Carlos M. Reyes

Abstract: The scalar sector of the Myers and Pospelov model is considered. This theory introduces a dimension 5 operator with a preferred four-vector which breaks Lorentz symmetry. We investigate various applications using the TFD formalism, a topological field theory that allows the study of thermal and size effects on an equal footing. In this context, Lorentz-violating corrections to the Casimir effect a… ▽ More The scalar sector of the Myers and Pospelov model is considered. This theory introduces a dimension 5 operator with a preferred four-vector which breaks Lorentz symmetry. We investigate various applications using the TFD formalism, a topological field theory that allows the study of thermal and size effects on an equal footing. In this context, Lorentz-violating corrections to the Casimir effect and Stefan-Boltzmann law have been calculated. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 17 pages

arXiv:2309.12802 [pdf, other]

doi 10.21528/CBIC2023-169

Deepfake audio as a data augmentation technique for training automatic speech to text transcription models

Authors: Alexandre R. Ferreira, Cláudio E. C. Campelo

Abstract: To train transcriptor models that produce robust results, a large and diverse labeled dataset is required. Finding such data with the necessary characteristics is a challenging task, especially for languages less popular than English. Moreover, producing such data requires significant effort and often money. Therefore, a strategy to mitigate this problem is the use of data augmentation techniques.… ▽ More To train transcriptor models that produce robust results, a large and diverse labeled dataset is required. Finding such data with the necessary characteristics is a challenging task, especially for languages less popular than English. Moreover, producing such data requires significant effort and often money. Therefore, a strategy to mitigate this problem is the use of data augmentation techniques. In this work, we propose a framework that approaches data augmentation based on deepfake audio. To validate the produced framework, experiments were conducted using existing deepfake and transcription models. A voice cloner and a dataset produced by Indians (in English) were selected, ensuring the presence of a single accent in the dataset. Subsequently, the augmented data was used to train speech to text models in various scenarios. △ Less

Submitted 22 September, 2023; originally announced September 2023.

Comments: 9 pages, 6 figures, 7 tables

ACM Class: I.2.6; I.2.0; E.0

arXiv:2306.03876 [pdf, other]

Aether field coupled to the electromagnetic field in the TFD formalism

Authors: R. Corrêa, L. H. A. R. Ferreira, A. F. Santos, Faqir C. Khanna

Abstract: In this paper, the aether field, which leads to the violation of Lorentz symmetries, coupled with the electromagnetic field is considered. In order to study thermal and size effects in this theory, the Thermo Field Dynamics (TFD) formalism is used. TFD is a real-time quantum field theory that has an interesting topological structure. Here three different topologies are taken, then three different… ▽ More In this paper, the aether field, which leads to the violation of Lorentz symmetries, coupled with the electromagnetic field is considered. In order to study thermal and size effects in this theory, the Thermo Field Dynamics (TFD) formalism is used. TFD is a real-time quantum field theory that has an interesting topological structure. Here three different topologies are taken, then three different phenomena are calculated. These effects are investigated considering that the aether field can point in different directions. The results obtained are compared with the usual results of the Lorentz invariant electromagnetic field. △ Less

Submitted 6 June, 2023; originally announced June 2023.

Comments: 15 pages, accepted for publication in EPJP

arXiv:2210.16683 [pdf, other]

A Bayesian Hierarchical Model Framework to Quantify Uncertainty of Tropical Cyclone Precipitation Forecasts

Authors: Stephen A. Walsh, Marco A. R. Ferreira, Dave Higdon, Stephanie Zick

Abstract: Tropical cyclones present a serious threat to many coastal communities around the world. Many numerical weather prediction models provide deterministic forecasts with limited measures of their forecast uncertainty. Standard postprocessing techniques may struggle with extreme events or use a 30-day training window that will not adequately characterize the uncertainty of a tropical cyclone forecast.… ▽ More Tropical cyclones present a serious threat to many coastal communities around the world. Many numerical weather prediction models provide deterministic forecasts with limited measures of their forecast uncertainty. Standard postprocessing techniques may struggle with extreme events or use a 30-day training window that will not adequately characterize the uncertainty of a tropical cyclone forecast. We propose a novel approach that leverages information from past storm events, using a hierarchical model to quantify uncertainty in the spatial correlation parameters of the forecast errors (modeled as Gaussian processes) for a numerical weather prediction model. This approach addresses a massive data problem by implementing a drastic dimension reduction through the assumption that the MLE and Hessian matrix represent all useful information from each tropical cyclone. From this, simulated forecast errors provide uncertainty quantification for future tropical cyclone forecasts. We apply this method to the North American Mesoscale model forecasts and use observations based on the Stage IV data product for 47 tropical cyclones between 2004 and 2017. For an incoming storm, our hierarchical framework combines the forecast from the North American Mesoscale model with the information from previous storms to create 95\% and 99\% prediction maps of rain. For six test storms from 2018 and 2019, these maps provide appropriate probabilistic coverage of observations. We show evidence from the log scoring rule that the proposed hierarchical framework performs best among competing methods. △ Less

Submitted 29 October, 2022; originally announced October 2022.

Comments: 42 pages, 18 figures

arXiv:2208.10996 [pdf, other]

An Evolutionary Approach for Creating of Diverse Classifier Ensembles

Authors: Alvaro R. Ferreira Jr, Fabio A. Faria, Gustavo Carneiro, Vinicius V. de Melo

Abstract: Classification is one of the most studied tasks in data mining and machine learning areas and many works in the literature have been presented to solve classification problems for multiple fields of knowledge such as medicine, biology, security, and remote sensing. Since there is no single classifier that achieves the best results for all kinds of applications, a good alternative is to adopt class… ▽ More Classification is one of the most studied tasks in data mining and machine learning areas and many works in the literature have been presented to solve classification problems for multiple fields of knowledge such as medicine, biology, security, and remote sensing. Since there is no single classifier that achieves the best results for all kinds of applications, a good alternative is to adopt classifier fusion strategies. A key point in the success of classifier fusion approaches is the combination of diversity and accuracy among classifiers belonging to an ensemble. With a large amount of classification models available in the literature, one challenge is the choice of the most suitable classifiers to compose the final classification system, which generates the need of classifier selection strategies. We address this point by proposing a framework for classifier selection and fusion based on a four-step protocol called CIF-E (Classifiers, Initialization, Fitness function, and Evolutionary algorithm). We implement and evaluate 24 varied ensemble approaches following the proposed CIF-E protocol and we are able to find the most accurate approach. A comparative analysis has also been performed among the best approaches and many other baselines from the literature. The experiments show that the proposed evolutionary approach based on Univariate Marginal Distribution Algorithm (UMDA) can outperform the state-of-the-art literature approaches in many well-known UCI datasets. △ Less

Submitted 23 August, 2022; originally announced August 2022.

arXiv:2206.00869 [pdf, ps, other]

Spatiotemporal models for Poisson areal data with an application to the AIDS epidemic in Rio de Janeiro

Authors: Marco A. R. Ferreira, Juan C. Vivar

Abstract: We present a class of spatiotemporal models for Poisson areal data suitable for the analysis of emerging infectious diseases. These models assume Poisson observations related through a link equation to a latent random field process. This latent random field process evolves through time with proper Gaussian Markov random field convolutions. Our approach naturally accommodates flexible structures su… ▽ More We present a class of spatiotemporal models for Poisson areal data suitable for the analysis of emerging infectious diseases. These models assume Poisson observations related through a link equation to a latent random field process. This latent random field process evolves through time with proper Gaussian Markov random field convolutions. Our approach naturally accommodates flexible structures such as distinct but interacting temporal trends for each region and across-time contamination among neighboring regions. We develop a Bayesian analysis approach with a simulation-based procedure: specifically, we construct a Markov chain Monte Carlo algorithm based on the generalized extended Kalman filter to obtain samples from an approximate posterior distribution. Finally, for the comparison of Poisson spatiotemporal models, we develop a simulation-based conditional Bayes factor. We illustrate the utility and flexibility of our Poisson spatiotemporal framework with an application to the number of acquired immunodeficiency syndrome (AIDS) cases during the period 1982-2007 in Rio de Janeiro. △ Less

Submitted 2 June, 2022; originally announced June 2022.

Comments: 26 pages, 2 figures, 2 tables

arXiv:2112.09614 [pdf, ps, other]

doi 10.1016/j.physletb.2021.136845

TFD formalism: applications to the scalar field in a Lorentz-violating theory

Authors: L. H. A. R. Ferreira, A. F. Santos, Faqir C. Khanna

Abstract: The Thermofield Dynamics (TFD) formalism is considered. In this context, a Lorentz-breaking scalar field theory is introduced. In contrast to the Matsubara formalism, the best-known approach to introducing the temperature effect, TFD is a real-time formalism and is a topological field theory. While in Matsubara the temperature effects are introduced as a consequence of a compactification of the fi… ▽ More The Thermofield Dynamics (TFD) formalism is considered. In this context, a Lorentz-breaking scalar field theory is introduced. In contrast to the Matsubara formalism, the best-known approach to introducing the temperature effect, TFD is a real-time formalism and is a topological field theory. While in Matsubara the temperature effects are introduced as a consequence of a compactification of the field in a finite interval on the time axis, in the TFD this effect emerges through a condensed state related to the Bogoliubov transformation. An advantage of the TFD formalism is that different topologies, which lead to different effects, can be chosen. Here, three different topologies are considered. Then the Stefan-Boltzmann law and Casimir effect at zero and non-zero temperature are calculated. This is a unique feature of TFD, which allows us to treat different phenomena in the same way. △ Less

Submitted 17 December, 2021; originally announced December 2021.

Comments: 16 pages, accepted for publication in PLB

arXiv:2003.02722 [pdf, other]

doi 10.1103/PhysRevMaterials.4.113603

Chemical Bonding in Metallic Glasses from Machine Learning and Crystal Orbital Hamilton Population

Authors: Ary R. Ferreira

Abstract: The chemistry (composition and bonding information) of metallic glasses (MGs) is at least as important as structural topology for understanding their properties and production/processing peculiarities. This article reports a machine learning (ML)-based approach that brings an unprecedented "big picture" view of chemical bond strengths in MGs of a prototypical alloy system. The connection between e… ▽ More The chemistry (composition and bonding information) of metallic glasses (MGs) is at least as important as structural topology for understanding their properties and production/processing peculiarities. This article reports a machine learning (ML)-based approach that brings an unprecedented "big picture" view of chemical bond strengths in MGs of a prototypical alloy system. The connection between electronic structure and chemical bonding is given by crystal orbital Hamilton population (COHP) analysis; within the framework of density functional theory (DFT). The stated comprehensive overview is made possible through a combination of: efficient quantitative estimate of bond strengths supplied by COHP analysis; representative statistics regarding structure in terms of atomic configurations achieved with classical molecular dynamics simulations; and the smooth overlap of atomic positions (SOAP) descriptor. The study is supplemented by an application of that ML model under the scope of mechanical loading; in which the resulting overview of chemical bond strengths revealed a chemical/structural heterogeneity that is in line with the tendency to bond exchange verified for atomic local environments. The encouraging results pave the way towards alternative approaches applicable in plenty of other contexts in which atom categorization (from the perspective of chemical bonds) plays a key role. △ Less

Submitted 21 July, 2020; v1 submitted 5 March, 2020; originally announced March 2020.

Comments: This update contains a number of enhancements. The manuscript has been updated after an anonymous review process by a journal whose policies discourage posting material from this process online. Few changes have been done in Sections I, II and III-A; however, Section III-B and IV have undergone significant changes

Journal ref: Phys. Rev. Materials 4, 113603 (2020)

arXiv:1509.04838 [pdf, ps, other]

doi 10.1214/15-AOAS815

hmmSeq: A hidden Markov model for detecting differentially expressed genes from RNA-seq data

Authors: Shiqi Cui, Subharup Guha, Marco A. R. Ferreira, Allison N. Tegge

Abstract: We introduce hmmSeq, a model-based hierarchical Bayesian technique for detecting differentially expressed genes from RNA-seq data. Our novel hmmSeq methodology uses hidden Markov models to account for potential co-expression of neighboring genes. In addition, hmmSeq employs an integrated approach to studies with technical or biological replicates, automatically adjusting for any extra-Poisson vari… ▽ More We introduce hmmSeq, a model-based hierarchical Bayesian technique for detecting differentially expressed genes from RNA-seq data. Our novel hmmSeq methodology uses hidden Markov models to account for potential co-expression of neighboring genes. In addition, hmmSeq employs an integrated approach to studies with technical or biological replicates, automatically adjusting for any extra-Poisson variability. Moreover, for cases when paired data are available, hmmSeq includes a paired structure between treatments that incoporates subject-specific effects. To perform parameter estimation for the hmmSeq model, we develop an efficient Markov chain Monte Carlo algorithm. Further, we develop a procedure for detection of differentially expressed genes that automatically controls false discovery rate. A simulation study shows that the hmmSeq methodology performs better than competitors in terms of receiver operating characteristic curves. Finally, the analyses of three publicly available RNA-seq data sets demonstrate the power and flexibility of the hmmSeq methodology. An R package implementing the hmmSeq framework will be submitted to CRAN upon publication of the manuscript. △ Less

Submitted 16 September, 2015; originally announced September 2015.

Comments: Published at http://dx.doi.org/10.1214/15-AOAS815 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS815

Journal ref: Annals of Applied Statistics 2015, Vol. 9, No. 2, 901-925

arXiv:1110.0231 [pdf, ps, other]

doi 10.1103/PhysRevB.84.235119

{\it Ab initio} $^{27}Al$ NMR chemical shifts and quadrupolar parameters for $Al_2O_3$ phases and their precursors

Authors: Ary R. Ferreira, Emine Küçükbenli, Alexandre A. Leitão, Stefano de Gironcoli

Abstract: The Gauge-Including Projector Augmented Wave (GIPAW) method, within the Density Functional Theory (DFT) Generalized Gradient Approximation (GGA) framework, is applied to compute solid state NMR parameters for $^{27}Al$ in the $α$, $θ$, and $κ$ aluminium oxide phases and their gibbsite and boehmite precursors. The results for well-established crystalline phases compare very well with available expe… ▽ More The Gauge-Including Projector Augmented Wave (GIPAW) method, within the Density Functional Theory (DFT) Generalized Gradient Approximation (GGA) framework, is applied to compute solid state NMR parameters for $^{27}Al$ in the $α$, $θ$, and $κ$ aluminium oxide phases and their gibbsite and boehmite precursors. The results for well-established crystalline phases compare very well with available experimental data and provide confidence in the accuracy of the method. For $γ$-alumina, four structural models proposed in the literature are discussed in terms of their ability to reproduce the experimental spectra also reported in the literature. Among the considered models, the $Fd\bar{3}m$ structure proposed by Paglia {\it et al.} [Phys. Rev. B {\bf 71}, 224115 (2005)] shows the best agreement. We attempt to link the theoretical NMR parameters to the local geometry. Chemical shifts depend on coordination number but no further correlation is found with geometrical parameters. Instead our calculations reveal that, within a given coordination number, a linear correlation exists between chemical shifts and Born effective charges. △ Less

Submitted 2 October, 2011; originally announced October 2011.

Showing 1–13 of 13 results for author: Ferreira, A R