-
Bayesian Hybrid Machine Learning of Gallstone Risk
Authors:
Chitradipa Chakraborty,
Nayana Mukherjee
Abstract:
Gallstone disease is a complex, multifactorial condition with significant global health burdens. Identifying underlying risk factors and their interactions is crucial for early diagnosis, targeted prevention, and effective clinical management. Although logistic regression remains a standard tool for assessing associations between predictors and gallstone status, it often underperforms in high-dime…
▽ More
Gallstone disease is a complex, multifactorial condition with significant global health burdens. Identifying underlying risk factors and their interactions is crucial for early diagnosis, targeted prevention, and effective clinical management. Although logistic regression remains a standard tool for assessing associations between predictors and gallstone status, it often underperforms in high-dimensional settings and may fail to capture intricate relationships among variables. To address these limitations, we propose a hybrid machine learning framework that integrates robust variable selection with advanced interaction detection. Specifically, Adaptive LASSO is employed to identify a sparse and interpretable subset of influential features, followed by Bayesian Additive Regression Trees (BART) to model nonlinear effects and uncover key interactions. Selected interactions are further characterized by physiological knowledge through differential equation-informed interaction terms, grounding the model in biologically plausible mechanisms. The insights gained from these steps are then integrated into a final logistic regression model within a Bayesian framework, providing a balance between predictive accuracy and clinical interpretability. This proposed framework not only enhances prediction but also yields actionable insights, offering a valuable support tool for medical research and decision-making.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
Evaluating Usage of Images for App Classification
Authors:
Kushal Singla,
Niloy Mukherjee,
Hari Manassery Koduvely,
Joy Bose
Abstract:
App classification is useful in a number of applications such as adding apps to an app store or building a user model based on the installed apps. Presently there are a number of existing methods to classify apps based on a given taxonomy on the basis of their text metadata. However, text based methods for app classification may not work in all cases, such as when the text descriptions are in a di…
▽ More
App classification is useful in a number of applications such as adding apps to an app store or building a user model based on the installed apps. Presently there are a number of existing methods to classify apps based on a given taxonomy on the basis of their text metadata. However, text based methods for app classification may not work in all cases, such as when the text descriptions are in a different language, or missing, or inadequate to classify the app. One solution in such cases is to utilize the app images to supplement the text description. In this paper, we evaluate a number of approaches in which app images can be used to classify the apps. In one approach, we use Optical character recognition (OCR) to extract text from images, which is then used to supplement the text description of the app. In another, we use pic2vec to convert the app images into vectors, then train an SVM to classify the vectors to the correct app label. In another, we use the captionbot.ai tool to generate natural language descriptions from the app images. Finally, we use a method to detect and label objects in the app images and use a voting technique to determine the category of the app based on all the images. We compare the performance of our image-based techniques to classify a number of apps in our dataset. We use a text based SVM app classifier as our base and obtained an improved classification accuracy of 96% for some classes when app images are added.
△ Less
Submitted 16 December, 2019;
originally announced December 2019.
-
Bayesian Learning of Dynamic Multilayer Networks
Authors:
Daniele Durante,
Nabanita Mukherjee,
Rebecca C. Steorts
Abstract:
A plethora of networks is being collected in a growing number of fields, including disease transmission, international relations, social interactions, and others. As data streams continue to grow, the complexity associated with these highly multidimensional connectivity data presents novel challenges. In this paper, we focus on the time-varying interconnections among a set of actors in multiple co…
▽ More
A plethora of networks is being collected in a growing number of fields, including disease transmission, international relations, social interactions, and others. As data streams continue to grow, the complexity associated with these highly multidimensional connectivity data presents novel challenges. In this paper, we focus on the time-varying interconnections among a set of actors in multiple contexts, called layers. Current literature lacks flexible statistical models for dynamic multilayer networks, which can enhance quality in inference and prediction by efficiently borrowing information within each network, across time, and between layers. Motivated by this gap, we develop a Bayesian nonparametric model leveraging latent space representations. Our formulation characterizes the edge probabilities as a function of shared and layer-specific actors positions in a latent space, with these positions changing in time via Gaussian processes. This representation facilitates dimensionality reduction and incorporates different sources of information in the observed data. In addition, we obtain tractable procedures for posterior computation, inference, and prediction. We provide theoretical results on the flexibility of our model. Our methods are tested on simulations and infection studies monitoring dynamic face-to-face contacts among individuals in multiple days, where we perform better than current methods in inference and prediction.
△ Less
Submitted 30 December, 2016; v1 submitted 7 August, 2016;
originally announced August 2016.
-
Identifying heterogeneous transgenerational DNA methylation sites via clustering in beta regression
Authors:
Shengtong Han,
Hongmei Zhang,
Gabrielle A. Lockett,
Nandini Mukherjee,
John W. Holloway,
Wilfried Karmaus
Abstract:
This paper explores the transgenerational DNA methylation pattern (DNA methylation transmitted from one generation to the next) via a clustering approach. Beta regression is employed to model the transmission pattern from parents to their offsprings at the population level. To facilitate this goal, an expectation maximization algorithm for parameter estimation along with a BIC criterion to determi…
▽ More
This paper explores the transgenerational DNA methylation pattern (DNA methylation transmitted from one generation to the next) via a clustering approach. Beta regression is employed to model the transmission pattern from parents to their offsprings at the population level. To facilitate this goal, an expectation maximization algorithm for parameter estimation along with a BIC criterion to determine the number of clusters is proposed. Applying our method to the DNA methylation data composed of 4063 CpG sites of 41 mother-father-infant triads, we identified a set of CpG sites in which DNA methylation transmission is dominated by fathers, while at a large number of CpG sites, DNA methylation is mainly maternally transmitted to the offspring.
△ Less
Submitted 9 February, 2016;
originally announced February 2016.