-
GLASD: A Loss-Function-Agnostic Global Optimizer for Robust Correlation Estimation under Data Contamination and Heavy Tails
Authors:
Priyam Das
Abstract:
Robust correlation estimation is essential in high-dimensional settings, particularly when data are contaminated by outliers or exhibit heavy-tailed behavior. Many robust loss functions of practical interest-such as those involving truncation or redescending M-estimators-lead to objective functions that are inherently non-convex and non-differentiable. Traditional methods typically focus on a sing…
▽ More
Robust correlation estimation is essential in high-dimensional settings, particularly when data are contaminated by outliers or exhibit heavy-tailed behavior. Many robust loss functions of practical interest-such as those involving truncation or redescending M-estimators-lead to objective functions that are inherently non-convex and non-differentiable. Traditional methods typically focus on a single loss function tailored to a specific contamination model and develop custom algorithms tightly coupled with that loss, limiting generality and adaptability. We introduce GLASD (Global Adaptive Stochastic Descent), a general-purpose black-box optimization algorithm designed to operate over the manifold of positive definite correlation matrices. Unlike conventional solvers, GLASD requires no gradient information and imposes no assumptions of convexity or smoothness, making it ideally suited for optimizing a wide class of loss functions-including non-convex, non-differentiable, or discontinuous objectives. This flexibility allows GLASD to serve as a unified framework for robust estimation under arbitrary user-defined criteria. We demonstrate its effectiveness through extensive simulations involving contaminated and heavy-tailed distributions, as well as a real-data application to breast cancer proteomic network inference, where GLASD successfully identifies biologically plausible interactions despite the presence of outliers. The proposed method is scalable, constraint-aware, and available as open-source software at GitHub.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
A2 Copula-Driven Spatial Bayesian Neural Network For Modeling Non-Gaussian Dependence: A Simulation Study
Authors:
Agnideep Aich,
Sameera Hewage,
Md Monzur Murshed,
Ashit Baran Aich,
Amanda Mayeaux,
Asim K. Dey,
Kumer P. Das,
Bruce Wade
Abstract:
In this paper, we introduce the A2 Copula Spatial Bayesian Neural Network (A2-SBNN), a predictive spatial model designed to map coordinates to continuous fields while capturing both typical spatial patterns and extreme dependencies. By embedding the dual-tail novel Archimedean copula viz. A2 directly into the network's weight initialization, A2-SBNN naturally models complex spatial relationships,…
▽ More
In this paper, we introduce the A2 Copula Spatial Bayesian Neural Network (A2-SBNN), a predictive spatial model designed to map coordinates to continuous fields while capturing both typical spatial patterns and extreme dependencies. By embedding the dual-tail novel Archimedean copula viz. A2 directly into the network's weight initialization, A2-SBNN naturally models complex spatial relationships, including rare co-movements in the data. The model is trained through a calibration-driven process combining Wasserstein loss, moment matching, and correlation penalties to refine predictions and manage uncertainty. Simulation results show that A2-SBNN consistently delivers high accuracy across a wide range of dependency strengths, offering a new, effective solution for spatial data modeling beyond traditional Gaussian-based approaches.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
pared: Model selection using multi-objective optimization
Authors:
Priyam Das,
Sarah Robinson,
Christine B. Peterson
Abstract:
Motivation: Model selection is a ubiquitous challenge in statistics. For penalized models, model selection typically entails tuning hyperparameters to maximize a measure of fit or minimize out-of-sample prediction error. However, these criteria fail to reflect other desirable characteristics, such as model sparsity, interpretability, or smoothness. Results: We present the R package pared to enable…
▽ More
Motivation: Model selection is a ubiquitous challenge in statistics. For penalized models, model selection typically entails tuning hyperparameters to maximize a measure of fit or minimize out-of-sample prediction error. However, these criteria fail to reflect other desirable characteristics, such as model sparsity, interpretability, or smoothness. Results: We present the R package pared to enable the use of multi-objective optimization for model selection. Our approach entails the use of Gaussian process-based optimization to efficiently identify solutions that represent desirable trade-offs. Our implementation includes popular models with multiple objectives including the elastic net, fused lasso, fused graphical lasso, and group graphical lasso. Our R package generates interactive graphics that allow the user to identify hyperparameter values that result in fitted models which lie on the Pareto frontier. Availability: We provide the R package pared and vignettes illustrating its application to both simulated and real data at https://github.com/priyamdas2/pared.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Optimal Intervention for Self-triggering Spatial Networks with Application to Urban Crime Analytics
Authors:
Pramit Das,
Moulinath Banerjee,
Yuekai Sun
Abstract:
In many network systems, events at one node trigger further activity at other nodes, e.g., social media users reacting to each other's posts or the clustering of criminal activity in urban environments. These systems are typically referred to as self-exciting networks. In such systems, targeted intervention at critical nodes can be an effective strategy for mitigating undesirable consequences such…
▽ More
In many network systems, events at one node trigger further activity at other nodes, e.g., social media users reacting to each other's posts or the clustering of criminal activity in urban environments. These systems are typically referred to as self-exciting networks. In such systems, targeted intervention at critical nodes can be an effective strategy for mitigating undesirable consequences such as further propagation of criminal activity or the spreading of misinformation on social media. In our work, we develop an optimal network intervention model to explore how targeted interventions at critical nodes can mitigate cascading effects throughout a Spatiotemporal Hawkes network. Similar models have been studied previously in the literature in purely temporal Hawkes networks, but in our work, we extend them to a spatiotemporal setup and demonstrate the efficacy of our methods by comparing the post-intervention reduction in intensity to other heuristic strategies in simulated networks. Subsequently, we use our method on crime data from the LA police department database to find neighborhoods for strategic intervention to demonstrate an application in predictive policing.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Fundamental Safety-Capability Trade-offs in Fine-tuning Large Language Models
Authors:
Pin-Yu Chen,
Han Shen,
Payel Das,
Tianyi Chen
Abstract:
Fine-tuning Large Language Models (LLMs) on some task-specific datasets has been a primary use of LLMs. However, it has been empirically observed that this approach to enhancing capability inevitably compromises safety, a phenomenon also known as the safety-capability trade-off in LLM fine-tuning. This paper presents a theoretical framework for understanding the interplay between safety and capabi…
▽ More
Fine-tuning Large Language Models (LLMs) on some task-specific datasets has been a primary use of LLMs. However, it has been empirically observed that this approach to enhancing capability inevitably compromises safety, a phenomenon also known as the safety-capability trade-off in LLM fine-tuning. This paper presents a theoretical framework for understanding the interplay between safety and capability in two primary safety-aware LLM fine-tuning strategies, providing new insights into the effects of data similarity, context overlap, and alignment loss landscape. Our theoretical results characterize the fundamental limits of the safety-capability trade-off in LLM fine-tuning, which are also validated by numerical experiments.
△ Less
Submitted 24 March, 2025;
originally announced March 2025.
-
Developing cholera outbreak forecasting through qualitative dynamics: Insights into Malawi case study
Authors:
Adrita Ghosh,
Parthasakha Das,
Tanujit Chakraborty,
Pritha Das,
Dibakar Ghosh
Abstract:
Cholera, an acute diarrheal disease, is a serious concern in developing and underdeveloped areas. A qualitative understanding of cholera epidemics aims to foresee transmission patterns based on reported data and mechanistic models. The mechanistic model is a crucial tool for capturing the dynamics of disease transmission and population spread. However, using real-time cholera cases is essential fo…
▽ More
Cholera, an acute diarrheal disease, is a serious concern in developing and underdeveloped areas. A qualitative understanding of cholera epidemics aims to foresee transmission patterns based on reported data and mechanistic models. The mechanistic model is a crucial tool for capturing the dynamics of disease transmission and population spread. However, using real-time cholera cases is essential for forecasting the transmission trend. This prospective study seeks to furnish insights into transmission trends through qualitative dynamics followed by machine learning-based forecasting. The Monte Carlo Markov Chain approach is employed to calibrate the proposed mechanistic model. We identify critical parameters that illustrate the disease's dynamics using partial rank correlation coefficient-based sensitivity analysis. The basic reproduction number as a crucial threshold measures asymptotic dynamics. Furthermore, forward bifurcation directs the stability of the infection state, and Hopf bifurcation suggests that trends in transmission may become unpredictable as societal disinfection rates rise. Further, we develop epidemic-informed machine learning models by incorporating mechanistic cholera dynamics into autoregressive integrated moving averages and autoregressive neural networks. We forecast short-term future cholera cases in Malawi by implementing the proposed epidemic-informed machine learning models to support this. We assert that integrating temporal dynamics into the machine learning models can enhance the capabilities of cholera forecasting models. The execution of this mechanism can significantly influence future trends in cholera transmission. This evolving approach can also be beneficial for policymakers to interpret and respond to potential disease systems. Moreover, our methodology is replicable and adaptable, encouraging future research on disease dynamics.
△ Less
Submitted 18 March, 2025;
originally announced March 2025.
-
Likelihood-Free Estimation for Spatiotemporal Hawkes processes with missing data and application to predictive policing
Authors:
Pramit Das,
Moulinath Banerjee,
Yuekai Sun
Abstract:
With the growing use of AI technology, many police departments use forecasting software to predict probable crime hotspots and allocate patrolling resources effectively for crime prevention. The clustered nature of crime data makes self-exciting Hawkes processes a popular modeling choice. However, one significant challenge in fitting such models is the inherent missingness in crime data due to non…
▽ More
With the growing use of AI technology, many police departments use forecasting software to predict probable crime hotspots and allocate patrolling resources effectively for crime prevention. The clustered nature of crime data makes self-exciting Hawkes processes a popular modeling choice. However, one significant challenge in fitting such models is the inherent missingness in crime data due to non-reporting, which can bias the estimated parameters of the predictive model, leading to inaccurate downstream hotspot forecasts, often resulting in over or under-policing in various communities, especially the vulnerable ones. Our work introduces a Wasserstein Generative Adversarial Networks (WGAN) driven likelihood-free approach to account for unreported crimes in Spatiotemporal Hawkes models. We demonstrate through empirical analysis how this methodology improves the accuracy of parametric estimation in the presence of data missingness, leading to more reliable and efficient policing strategies.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
B-MASTER: Scalable Bayesian Multivariate Regression for Master Predictor Discovery in Colorectal Cancer Microbiome-Metabolite Profiles
Authors:
Priyam Das,
Tanujit Dey,
Christine Peterson,
Sounak Chakraborty
Abstract:
The gut microbiome significantly influences responses to cancer therapies, including immunotherapies, primarily through its impact on the metabolome. Despite some studies on effects of specific microbial genera on individual metabolites, there is little prior work identifying key microbiome components at the genus level that shape the overall metabolome profile. To address this gap, we introduce B…
▽ More
The gut microbiome significantly influences responses to cancer therapies, including immunotherapies, primarily through its impact on the metabolome. Despite some studies on effects of specific microbial genera on individual metabolites, there is little prior work identifying key microbiome components at the genus level that shape the overall metabolome profile. To address this gap, we introduce B-MASTER (Bayesian Multivariate regression Analysis for Selecting Targeted Essential Regressors), a fully Bayesian framework with an L1 penalty to promote sparsity and an L2 penalty to shrink coefficients for non-major covariates, thereby isolating essential regressors. The method is paired with a scalable Gibbs sampling algorithm, whose computation grows linearly with the number of parameters and remains largely unaffected by sample size for models of fixed dimensions. Notably, B-MASTER enables full posterior inference for models with up to four million parameters within a practical time-frame. Its theoretical guarantees include posterior contraction, selection consistency, and robustness under mild misspecification. Using this approach, we identify key microbial genera shaping the metabolite profile, analyze their effects on the most abundant metabolites, and investigate metabolites differentially abundant in colorectal cancer (CRC) patients. These results provide foundational insights into microbiome-metabolite relationships relevant to cancer, a connection largely unexplored in existing literature.
△ Less
Submitted 12 September, 2025; v1 submitted 8 December, 2024;
originally announced December 2024.
-
SMART-MC: Characterizing the Dynamics of Multiple Sclerosis Therapy Transitions Using a Covariate-Based Markov Model
Authors:
Beomchang Kim,
Zongqi Xia,
Priyam Das
Abstract:
Treatment switching is a common occurrence in the management of Multiple Sclerosis (MS), where patients transition across various disease-modifying therapies (DMTs) due to heterogeneous treatment responses, differences in disease progression, patient characteristics, and therapy-associated adverse effects. To investigate how patient-level covariates influence the likelihood of treatment transition…
▽ More
Treatment switching is a common occurrence in the management of Multiple Sclerosis (MS), where patients transition across various disease-modifying therapies (DMTs) due to heterogeneous treatment responses, differences in disease progression, patient characteristics, and therapy-associated adverse effects. To investigate how patient-level covariates influence the likelihood of treatment transitions among DMTs, we adopt a Markovian framework, Sparse Matrix Estimation with Covariate-Based Transitions in Markov Chain Modeling (SMART-MC), in which the transition probabilities are modeled as functions of these covariates. Modeling real-world treatment transitions under this framework presents several challenges, including ensuring parameter identifiability and handling sparse transitions without overfitting. To address identifiability, we constrain each transition-specific covariate coefficient vectors to have a fixed L2 norm. Furthermore, our method automatically estimates transition probabilities for sparsely observed transitions as constants and enforces zero transition probabilities for transitions that are empirically unobserved. This approach mitigates the need for additional model complexity to handle sparsity while maintaining interpretability and efficiency. To optimize the multi-modal likelihood function, we develop a scalable, parallelized global optimization routine, which is validated through benchmark comparisons and supported by key theoretical properties. Our analysis uncovers meaningful patterns in DMT transitions, revealing variations across MS patient subgroups defined by age, race, and other clinical factors.
△ Less
Submitted 26 August, 2025; v1 submitted 2 December, 2024;
originally announced December 2024.
-
Leveraging Machine Learning for Official Statistics: A Statistical Manifesto
Authors:
Marco Puts,
David Salgado,
Piet Daas
Abstract:
It is important for official statistics production to apply ML with statistical rigor, as it presents both opportunities and challenges. Although machine learning has enjoyed rapid technological advances in recent years, its application does not possess the methodological robustness necessary to produce high quality statistical results. In order to account for all sources of error in machine learn…
▽ More
It is important for official statistics production to apply ML with statistical rigor, as it presents both opportunities and challenges. Although machine learning has enjoyed rapid technological advances in recent years, its application does not possess the methodological robustness necessary to produce high quality statistical results. In order to account for all sources of error in machine learning models, the Total Machine Learning Error (TMLE) is presented as a framework analogous to the Total Survey Error Model used in survey methodology. As a means of ensuring that ML models are both internally valid as well as externally valid, the TMLE model addresses issues such as representativeness and measurement errors. There are several case studies presented, illustrating the importance of applying more rigor to the application of machine learning in official statistics.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
Bias Correction in Machine Learning-based Classification of Rare Events
Authors:
Luuk Gubbels,
Marco Puts,
Piet Daas
Abstract:
Online platform businesses can be identified by using web-scraped texts. This is a classification problem that combines elements of natural language processing and rare event detection. Because online platforms are rare, accurately identifying them with Machine Learning algorithms is challenging. Here, we describe the development of a Machine Learning-based text classification approach that reduce…
▽ More
Online platform businesses can be identified by using web-scraped texts. This is a classification problem that combines elements of natural language processing and rare event detection. Because online platforms are rare, accurately identifying them with Machine Learning algorithms is challenging. Here, we describe the development of a Machine Learning-based text classification approach that reduces the number of false positives as much as possible. It greatly reduces the bias in the estimates obtained by using calibrated probabilities and ensembles.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Hölder regularity and roughness: construction and examples
Authors:
Erhan Bayraktar,
Purba Das,
Donghan Kim
Abstract:
We study how to construct a stochastic process on a finite interval with given `roughness' and finite joint moments of marginal distributions. We first extend Ciesielski's isomorphism along a general sequence of partitions, and provide a characterization of Hölder regularity of a function in terms of its Schauder coefficients. Using this characterization we provide a better (pathwise) estimator of…
▽ More
We study how to construct a stochastic process on a finite interval with given `roughness' and finite joint moments of marginal distributions. We first extend Ciesielski's isomorphism along a general sequence of partitions, and provide a characterization of Hölder regularity of a function in terms of its Schauder coefficients. Using this characterization we provide a better (pathwise) estimator of Hölder exponent. As an additional application, we construct fake (fractional) Brownian motions with some path properties and finite moments of marginal distributions same as (fractional) Brownian motions. These belong to non-Gaussian families of stochastic processes which are statistically difficult to distinguish from real (fractional) Brownian motions.
△ Less
Submitted 6 May, 2024; v1 submitted 26 April, 2023;
originally announced April 2023.
-
Polynomial spline regression: Theory and Application
Authors:
Mithun Kumar Acharjee,
Kumer Pial Das
Abstract:
To deal with non-linear relations between the predictors and the response, we can use transformations to make the data look linear or approximately linear. In practice, however, transformation methods may be ineffective, and it may be more efficient to use flexible regression techniques that can automatically handle nonlinear behavior. One such method is the Polynomial Spline (PS) regression. Beca…
▽ More
To deal with non-linear relations between the predictors and the response, we can use transformations to make the data look linear or approximately linear. In practice, however, transformation methods may be ineffective, and it may be more efficient to use flexible regression techniques that can automatically handle nonlinear behavior. One such method is the Polynomial Spline (PS) regression. Because the number of possible spline regression models is many, efficient strategies for choosing the best one are required. This study investigates the different spline regression models (Polynomial Spline based on Truncated Power, B-spline, and P-Spline) in theoretical and practical ways. We focus on the fundamental concepts as the spline regression is theoretically rich. In particular, we focus on the prediction using cross-validation (CV) rather than interpretation, as polynomial splines are challenging to interpret. We compare different PS models based on a real data set and conclude that the P-spline model is the best.
△ Less
Submitted 30 December, 2022;
originally announced December 2022.
-
Attribute Graphs Underlying Molecular Generative Models: Path to Learning with Limited Data
Authors:
Samuel C. Hoffman,
Payel Das,
Karthikeyan Shanmugam,
Kahini Wadhawan,
Prasanna Sattigeri
Abstract:
Training generative models that capture rich semantics of the data and interpreting the latent representations encoded by such models are very important problems in un-/self-supervised learning. In this work, we provide a simple algorithm that relies on perturbation experiments on latent codes of a pre-trained generative autoencoder to uncover an attribute graph that is implied by the generative m…
▽ More
Training generative models that capture rich semantics of the data and interpreting the latent representations encoded by such models are very important problems in un-/self-supervised learning. In this work, we provide a simple algorithm that relies on perturbation experiments on latent codes of a pre-trained generative autoencoder to uncover an attribute graph that is implied by the generative model. We perform perturbation experiments to check for influence of a given latent variable on a subset of attributes. Given this, we show that one can fit an effective graphical model that models a structural equation model between latent codes taken as exogenous variables and attributes taken as observed variables. One interesting aspect is that a single latent variable controls multiple overlapping subsets of attributes unlike conventional approaches that try to impose full independence. Using a pre-trained generative autoencoder trained on a large dataset of small molecules, we demonstrate that the graphical model between various molecular attributes and latent codes learned by our algorithm can be used to predict a specific property for molecules which are drawn from a different distribution. We compare prediction models trained on various feature subsets chosen by simple baselines, as well as existing causal discovery and sparse learning/feature selection methods, with the ones in the derived Markov blanket from our method. Results show empirically that the predictor that relies on our Markov blanket attributes is robust to distribution shifts when transferred or fine-tuned with a few samples from the new distribution, especially when training data is limited.
△ Less
Submitted 29 August, 2024; v1 submitted 14 July, 2022;
originally announced July 2022.
-
Accelerating Inhibitor Discovery With A Deep Generative Foundation Model: Validation for SARS-CoV-2 Drug Targets
Authors:
Vijil Chenthamarakshan,
Samuel C. Hoffman,
C. David Owen,
Petra Lukacik,
Claire Strain-Damerell,
Daren Fearon,
Tika R. Malla,
Anthony Tumber,
Christopher J. Schofield,
Helen M. E. Duyvesteyn,
Wanwisa Dejnirattisai,
Loic Carrique,
Thomas S. Walter,
Gavin R. Screaton,
Tetiana Matviiuk,
Aleksandra Mojsilovic,
Jason Crain,
Martin A. Walsh,
David I. Stuart,
Payel Das
Abstract:
The discovery of novel inhibitor molecules for emerging drug-target proteins is widely acknowledged as a challenging inverse design problem: Exhaustive exploration of the vast chemical search space is impractical, especially when the target structure or active molecules are unknown. Here we validate experimentally the broad utility of a deep generative framework trained at-scale on protein sequenc…
▽ More
The discovery of novel inhibitor molecules for emerging drug-target proteins is widely acknowledged as a challenging inverse design problem: Exhaustive exploration of the vast chemical search space is impractical, especially when the target structure or active molecules are unknown. Here we validate experimentally the broad utility of a deep generative framework trained at-scale on protein sequences, small molecules, and their mutual interactions -- that is unbiased toward any specific target. As demonstrators, we consider two dissimilar and relevant SARS-CoV-2 targets: the main protease and the spike protein (receptor binding domain, RBD). To perform target-aware design of novel inhibitor molecules, a protein sequence-conditioned sampling on the generative foundation model is performed. Despite using only the target sequence information, and without performing any target-specific adaptation of the generative model, micromolar-level inhibition was observed in in vitro experiments for two candidates out of only four synthesized for each target. The most potent spike RBD inhibitor also exhibited activity against several variants in live virus neutralization assays. These results therefore establish that a single, broadly deployable generative foundation model for accelerated hit discovery is effective and efficient, even in the most general case where neither target structure nor binder information is available.
△ Less
Submitted 14 October, 2022; v1 submitted 19 April, 2022;
originally announced April 2022.
-
A Comparative Study on Forecasting of Retail Sales
Authors:
Md Rashidul Hasan,
Muntasir A Kabir,
Rezoan A Shuvro,
Pankaz Das
Abstract:
Predicting product sales of large retail companies is a challenging task considering volatile nature of trends, seasonalities, events as well as unknown factors such as market competitions, change in customer's preferences, or unforeseen events, e.g., COVID-19 outbreak. In this paper, we benchmark forecasting models on historical sales data from Walmart to predict their future sales. We provide a…
▽ More
Predicting product sales of large retail companies is a challenging task considering volatile nature of trends, seasonalities, events as well as unknown factors such as market competitions, change in customer's preferences, or unforeseen events, e.g., COVID-19 outbreak. In this paper, we benchmark forecasting models on historical sales data from Walmart to predict their future sales. We provide a comprehensive theoretical overview and analysis of the state-of-the-art timeseries forecasting models. Then, we apply these models on the forecasting challenge dataset (M5 forecasting by Kaggle). Specifically, we use a traditional model, namely, ARIMA (Autoregressive Integrated Moving Average), and recently developed advanced models e.g., Prophet model developed by Facebook, light gradient boosting machine (LightGBM) model developed by Microsoft and benchmark their performances. Results suggest that ARIMA model outperforms the Facebook Prophet and LightGBM model while the LightGBM model achieves huge computational gain for the large dataset with negligible compromise in the prediction accuracy.
△ Less
Submitted 14 March, 2022;
originally announced March 2022.
-
Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination
Authors:
Arpan Mukherjee,
Ali Tajer,
Pin-Yu Chen,
Payel Das
Abstract:
This paper investigates the problem of best arm identification in $\textit{contaminated}$ stochastic multi-arm bandits. In this setting, the rewards obtained from any arm are replaced by samples from an adversarial model with probability $\varepsilon$. A fixed confidence (infinite-horizon) setting is considered, where the goal of the learner is to identify the arm with the largest mean. Owing to t…
▽ More
This paper investigates the problem of best arm identification in $\textit{contaminated}$ stochastic multi-arm bandits. In this setting, the rewards obtained from any arm are replaced by samples from an adversarial model with probability $\varepsilon$. A fixed confidence (infinite-horizon) setting is considered, where the goal of the learner is to identify the arm with the largest mean. Owing to the adversarial contamination of the rewards, each arm's mean is only partially identifiable. This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits. These algorithms involve mean estimates that achieve the optimal error guarantee on the deviation of the true mean from the estimate asymptotically. Furthermore, these algorithms asymptotically achieve the optimal sample complexity. Specifically, for the gap-based algorithm, the sample complexity is asymptotically optimal up to constant factors, while for the successive elimination-based algorithm, it is optimal up to logarithmic factors. Finally, numerical experiments are provided to illustrate the gains of the algorithms compared to the existing baselines.
△ Less
Submitted 14 November, 2021;
originally announced November 2021.
-
Non-Asymptotic Guarantees for Reliable Identification of Granger Causality via the LASSO
Authors:
Proloy Das,
Behtash Babadi
Abstract:
Granger causality is among the widely used data-driven approaches for causal analysis of time series data with applications in various areas including economics, molecular biology, and neuroscience. Two of the main challenges of this methodology are: 1) over-fitting as a result of limited data duration, and 2) correlated process noise as a confounding factor, both leading to errors in identifying…
▽ More
Granger causality is among the widely used data-driven approaches for causal analysis of time series data with applications in various areas including economics, molecular biology, and neuroscience. Two of the main challenges of this methodology are: 1) over-fitting as a result of limited data duration, and 2) correlated process noise as a confounding factor, both leading to errors in identifying the causal influences. Sparse estimation via the LASSO has successfully addressed these challenges for parameter estimation. However, the classical statistical tests for Granger causality resort to asymptotic analysis of ordinary least squares, which require long data duration to be useful and are not immune to confounding effects. In this work, we address this disconnect by introducing a LASSO-based statistic and studying its non-asymptotic properties under the assumption that the true models admit sparse autoregressive representations. We establish fundamental limits for reliable identification of Granger causal influences using the proposed LASSO-based statistic. We further characterize the false positive error probability and test power of a simple thresholding rule for identifying Granger causal effects and provide two methods to set the threshold in a data-driven fashion. We present simulation studies and application to real data to compare the performance of our proposed method to ordinary least squares and existing LASSO-based methods in detecting Granger causal influences, which corroborate our theoretical results.
△ Less
Submitted 14 July, 2023; v1 submitted 3 March, 2021;
originally announced March 2021.
-
Unbiased Estimations based on Binary Classifiers: A Maximum Likelihood Approach
Authors:
Marco J. H. Puts,
Piet J. H. Daas
Abstract:
Binary classifiers trained on a certain proportion of positive items introduce a bias when applied to data sets with different proportions of positive items. Most solutions for dealing with this issue assume that some information on the latter distribution is known. However, this is not always the case, certainly when this proportion is the target variable. In this paper a maximum likelihood estim…
▽ More
Binary classifiers trained on a certain proportion of positive items introduce a bias when applied to data sets with different proportions of positive items. Most solutions for dealing with this issue assume that some information on the latter distribution is known. However, this is not always the case, certainly when this proportion is the target variable. In this paper a maximum likelihood estimator for the true proportion of positives in data sets is suggested and tested on synthetic and real world data.
△ Less
Submitted 17 February, 2021;
originally announced February 2021.
-
How do mobility restrictions and social distancing during COVID-19 affect the crude oil price?
Authors:
Asim K. Dey,
Kumer P. Das
Abstract:
We develop an air mobility index and use the newly developed Apple's driving trend index to evaluate the impact of COVID-19 on the crude oil price. We use quantile regression and stationary and non-stationary extreme value models to study the impact. We find that both the \textit{air mobility index} and \textit{driving trend index} significantly influence lower and upper quantiles as well as the m…
▽ More
We develop an air mobility index and use the newly developed Apple's driving trend index to evaluate the impact of COVID-19 on the crude oil price. We use quantile regression and stationary and non-stationary extreme value models to study the impact. We find that both the \textit{air mobility index} and \textit{driving trend index} significantly influence lower and upper quantiles as well as the median of the WTI crude oil price. The extreme value model suggests that an event like COVID-19 may push oil prices to a negative territory again as the air mobility decreases drastically during such pandemics.
△ Less
Submitted 1 January, 2021;
originally announced January 2021.
-
Optimizing Mode Connectivity via Neuron Alignment
Authors:
N. Joseph Tatro,
Pin-Yu Chen,
Payel Das,
Igor Melnyk,
Prasanna Sattigeri,
Rongjie Lai
Abstract:
The loss landscapes of deep neural networks are not well understood due to their high nonconvexity. Empirically, the local minima of these loss functions can be connected by a learned curve in model space, along which the loss remains nearly constant; a feature known as mode connectivity. Yet, current curve finding algorithms do not consider the influence of symmetry in the loss surface created by…
▽ More
The loss landscapes of deep neural networks are not well understood due to their high nonconvexity. Empirically, the local minima of these loss functions can be connected by a learned curve in model space, along which the loss remains nearly constant; a feature known as mode connectivity. Yet, current curve finding algorithms do not consider the influence of symmetry in the loss surface created by model weight permutations. We propose a more general framework to investigate the effect of symmetry on landscape connectivity by accounting for the weight permutations of the networks being connected. To approximate the optimal permutation, we introduce an inexpensive heuristic referred to as neuron alignment. Neuron alignment promotes similarity between the distribution of intermediate activations of models along the curve. We provide theoretical analysis establishing the benefit of alignment to mode connectivity based on this simple heuristic. We empirically verify that the permutation given by alignment is locally optimal via a proximal alternating minimization scheme. Empirically, optimizing the weight permutation is critical for efficiently learning a simple, planar, low-loss curve between networks that successfully generalizes. Our alignment method can significantly alleviate the recently identified robust loss barrier on the path connecting two adversarial robust models and find more robust and accurate models on the path.
△ Less
Submitted 2 November, 2020; v1 submitted 4 September, 2020;
originally announced September 2020.
-
A Machine Learning Approach for Modelling Parking Duration in Urban Land-use
Authors:
Janak Parmar,
Pritikana Das,
Sanjaykumar Dave
Abstract:
Parking is an inevitable issue in the fast-growing developing countries. Increasing number of vehicles require more and more urban land to be allocated for parking. However, a little attention has been conferred to the parking issues in developing countries like India. This study proposes a model for analysing the influence of car users' socioeconomic and travel characteristics on parking duration…
▽ More
Parking is an inevitable issue in the fast-growing developing countries. Increasing number of vehicles require more and more urban land to be allocated for parking. However, a little attention has been conferred to the parking issues in developing countries like India. This study proposes a model for analysing the influence of car users' socioeconomic and travel characteristics on parking duration. Specifically, artificial neural networks (ANNs) is deployed to capture the interrelationship between driver characteristics and parking duration. ANNs are highly efficient in learning and recognizing connections between parameters for best prediction of an outcome. Since, utility of ANNs has been critically limited due to its Black Box nature, the study involves the use of Garson algorithm and Local interpretable model-agnostic explanations (LIME) for model interpretations. LIME shows the prediction for any classification, by approximating it locally with the developed interpretable model. This study is based on microdata collected on-site through interview surveys considering two land-uses: office-business and market/shopping. Results revealed the higher probability of prediction through LIME and therefore, the methodology can be adopted ubiquitously. Further, the policy implications are discussed based on the results for both land-uses. This unique study could lead to enhanced parking policy and management to achieve the sustainability goals.
△ Less
Submitted 10 October, 2023; v1 submitted 4 August, 2020;
originally announced August 2020.
-
On Second order correctness of Bootstrap in Logistic Regression
Authors:
Debraj Das,
Priyam Das
Abstract:
In the fields of clinical trials, biomedical surveys, marketing, banking, with dichotomous response variable, the logistic regression is considered as an alternative convenient approach to linear regression. In this paper, we develop a novel bootstrap technique based on perturbation resampling method for approximating the distribution of the maximum likelihood estimator (MLE) of the regression par…
▽ More
In the fields of clinical trials, biomedical surveys, marketing, banking, with dichotomous response variable, the logistic regression is considered as an alternative convenient approach to linear regression. In this paper, we develop a novel bootstrap technique based on perturbation resampling method for approximating the distribution of the maximum likelihood estimator (MLE) of the regression parameter vector. We establish second order correctness of the proposed bootstrap method after proper studentization and smoothing. It is shown that inferences drawn based on the proposed bootstrap method are more accurate compared to that based on asymptotic normality. The main challenge in establishing second order correctness remains in the fact that the response variable being binary, the resulting MLE has a lattice structure. We show the direct bootstrapping approach fails even after studentization. We adopt smoothing technique developed in Lahiri (1993) to ensure that the smoothed studentized version of the MLE has a density. Similar smoothing strategy is employed to the bootstrap version also to achieve second order correct approximation.
△ Less
Submitted 18 September, 2020; v1 submitted 3 July, 2020;
originally announced July 2020.
-
Combinatorial Black-Box Optimization with Expert Advice
Authors:
Hamid Dadkhahi,
Karthikeyan Shanmugam,
Jesus Rios,
Payel Das,
Samuel Hoffman,
Troy David Loeffler,
Subramanian Sankaranarayanan
Abstract:
We consider the problem of black-box function optimization over the boolean hypercube. Despite the vast literature on black-box function optimization over continuous domains, not much attention has been paid to learning models for optimization over combinatorial domains until recently. However, the computational complexity of the recently devised algorithms are prohibitive even for moderate number…
▽ More
We consider the problem of black-box function optimization over the boolean hypercube. Despite the vast literature on black-box function optimization over continuous domains, not much attention has been paid to learning models for optimization over combinatorial domains until recently. However, the computational complexity of the recently devised algorithms are prohibitive even for moderate numbers of variables; drawing one sample using the existing algorithms is more expensive than a function evaluation for many black-box functions of interest. To address this problem, we propose a computationally efficient model learning algorithm based on multilinear polynomials and exponential weight updates. In the proposed algorithm, we alternate between simulated annealing with respect to the current polynomial representation and updating the weights using monomial experts' advice. Numerical experiments on various datasets in both unconstrained and sum-constrained boolean optimization indicate the competitive performance of the proposed algorithm, while improving the computational time up to several orders of magnitude compared to state-of-the-art algorithms in the literature.
△ Less
Submitted 13 October, 2020; v1 submitted 6 June, 2020;
originally announced June 2020.
-
Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics
Authors:
Payel Das,
Tom Sercu,
Kahini Wadhawan,
Inkit Padhi,
Sebastian Gehrmann,
Flaviu Cipcigan,
Vijil Chenthamarakshan,
Hendrik Strobelt,
Cicero dos Santos,
Pin-Yu Chen,
Yi Yan Yang,
Jeremy Tan,
James Hedrick,
Jason Crain,
Aleksandra Mojsilovic
Abstract:
De novo therapeutic design is challenged by a vast chemical repertoire and multiple constraints, e.g., high broad-spectrum potency and low toxicity. We propose CLaSS (Controlled Latent attribute Space Sampling) - an efficient computational method for attribute-controlled generation of molecules, which leverages guidance from classifiers trained on an informative latent space of molecules modeled u…
▽ More
De novo therapeutic design is challenged by a vast chemical repertoire and multiple constraints, e.g., high broad-spectrum potency and low toxicity. We propose CLaSS (Controlled Latent attribute Space Sampling) - an efficient computational method for attribute-controlled generation of molecules, which leverages guidance from classifiers trained on an informative latent space of molecules modeled using a deep generative autoencoder. We screen the generated molecules for additional key attributes by using deep learning classifiers in conjunction with novel features derived from atomistic simulations. The proposed approach is demonstrated for designing non-toxic antimicrobial peptides (AMPs) with strong broad-spectrum potency, which are emerging drug candidates for tackling antibiotic resistance. Synthesis and testing of only twenty designed sequences identified two novel and minimalist AMPs with high potency against diverse Gram-positive and Gram-negative pathogens, including one multidrug-resistant and one antibiotic-resistant K. pneumoniae, via membrane pore formation. Both antimicrobials exhibit low in vitro and in vivo toxicity and mitigate the onset of drug resistance. The proposed approach thus presents a viable path for faster and efficient discovery of potent and selective broad-spectrum antimicrobials.
△ Less
Submitted 25 February, 2021; v1 submitted 22 May, 2020;
originally announced May 2020.
-
Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness
Authors:
Pu Zhao,
Pin-Yu Chen,
Payel Das,
Karthikeyan Natesan Ramamurthy,
Xue Lin
Abstract:
Mode connectivity provides novel geometric insights on analyzing loss landscapes and enables building high-accuracy pathways between well-trained neural networks. In this work, we propose to employ mode connectivity in loss landscapes to study the adversarial robustness of deep neural networks, and provide novel methods for improving this robustness. Our experiments cover various types of adversar…
▽ More
Mode connectivity provides novel geometric insights on analyzing loss landscapes and enables building high-accuracy pathways between well-trained neural networks. In this work, we propose to employ mode connectivity in loss landscapes to study the adversarial robustness of deep neural networks, and provide novel methods for improving this robustness. Our experiments cover various types of adversarial attacks applied to different network architectures and datasets. When network models are tampered with backdoor or error-injection attacks, our results demonstrate that the path connection learned using limited amount of bonafide data can effectively mitigate adversarial effects while maintaining the original accuracy on clean data. Therefore, mode connectivity provides users with the power to repair backdoored or error-injected models. We also use mode connectivity to investigate the loss landscapes of regular and robust models against evasion attacks. Experiments show that there exists a barrier in adversarial robustness loss on the path connecting regular and adversarially-trained models. A high correlation is observed between the adversarial robustness loss and the largest eigenvalue of the input Hessian matrix, for which theoretical justifications are provided. Our results suggest that mode connectivity offers a holistic tool and practical means for evaluating and improving adversarial robustness.
△ Less
Submitted 2 July, 2020; v1 submitted 30 April, 2020;
originally announced May 2020.
-
CogMol: Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models
Authors:
Vijil Chenthamarakshan,
Payel Das,
Samuel C. Hoffman,
Hendrik Strobelt,
Inkit Padhi,
Kar Wai Lim,
Benjamin Hoover,
Matteo Manica,
Jannis Born,
Teodoro Laino,
Aleksandra Mojsilovic
Abstract:
The novel nature of SARS-CoV-2 calls for the development of efficient de novo drug design approaches. In this study, we propose an end-to-end framework, named CogMol (Controlled Generation of Molecules), for designing new drug-like small molecules targeting novel viral proteins with high affinity and off-target selectivity. CogMol combines adaptive pre-training of a molecular SMILES Variational Au…
▽ More
The novel nature of SARS-CoV-2 calls for the development of efficient de novo drug design approaches. In this study, we propose an end-to-end framework, named CogMol (Controlled Generation of Molecules), for designing new drug-like small molecules targeting novel viral proteins with high affinity and off-target selectivity. CogMol combines adaptive pre-training of a molecular SMILES Variational Autoencoder (VAE) and an efficient multi-attribute controlled sampling scheme that uses guidance from attribute predictors trained on latent features. To generate novel and optimal drug-like molecules for unseen viral targets, CogMol leverages a protein-molecule binding affinity predictor that is trained using SMILES VAE embeddings and protein sequence embeddings learned unsupervised from a large corpus. CogMol framework is applied to three SARS-CoV-2 target proteins: main protease, receptor-binding domain of the spike protein, and non-structural protein 9 replicase. The generated candidates are novel at both molecular and chemical scaffold levels when compared to the training data. CogMol also includes insilico screening for assessing toxicity of parent molecules and their metabolites with a multi-task toxicity classifier, synthetic feasibility with a chemical retrosynthesis predictor, and target structure binding with docking simulations. Docking reveals favorable binding of generated molecules to the target protein structure, where 87-95 % of high affinity molecules showed docking free energy < -6 kcal/mol. When compared to approved drugs, the majority of designed compounds show low parent molecule and metabolite toxicity and high synthetic feasibility. In summary, CogMol handles multi-constraint design of synthesizable, low-toxic, drug-like molecules with high target specificity and selectivity, and does not need target-dependent fine-tuning of the framework or target structure information.
△ Less
Submitted 23 June, 2020; v1 submitted 2 April, 2020;
originally announced April 2020.
-
Early Response Assessment in Lung Cancer Patients using Spatio-temporal CBCT Images
Authors:
Bijju Kranthi Veduruparthi,
Jayanta Mukherjee,
Partha Pratim Das,
Mandira Saha,
Sanjoy Chatterjee,
Raj Kumar Shrimali,
Soumendranath Ray,
Sriram Prasath
Abstract:
We report a model to predict patient's radiological response to curative radiation therapy (RT) for non-small-cell lung cancer (NSCLC).
Cone-Beam Computed Tomography images acquired weekly during the six-week course of RT were contoured with the Gross Tumor Volume (GTV) by senior radiation oncologists for 53 patients (7 images per patient).
Deformable registration of the images yielded six def…
▽ More
We report a model to predict patient's radiological response to curative radiation therapy (RT) for non-small-cell lung cancer (NSCLC).
Cone-Beam Computed Tomography images acquired weekly during the six-week course of RT were contoured with the Gross Tumor Volume (GTV) by senior radiation oncologists for 53 patients (7 images per patient).
Deformable registration of the images yielded six deformation fields for each pair of consecutive images per patient.
Jacobian of a field provides a measure of local expansion/contraction and is used in our model.
Delineations were compared post-registration to compute unchanged ($U$), newly grown ($G$), and reduced ($R$) regions within GTV.
The mean Jacobian of these regions $μ_U$, $μ_G$ and $μ_R$ are statistically compared and a response assessment model is proposed.
A good response is hypothesized if $μ_R < 1.0$, $μ_R < μ_U$, and $μ_G < μ_U$.
For early prediction of post-treatment response, first, three weeks' images are used.
Our model predicted clinical response with a precision of $74\%$.
Using reduction in CT numbers (CTN) and percentage GTV reduction as features in logistic regression, yielded an area-under-curve of 0.65 with p=0.005.
Combining logistic regression model with the proposed hypothesis yielded an odds ratio of 20.0 (p=0.0).
△ Less
Submitted 7 March, 2020;
originally announced March 2020.
-
Novel Radiomic Feature for Survival Prediction of Lung Cancer Patients using Low-Dose CBCT Images
Authors:
Bijju Kranthi Veduruparthi,
Jayanta Mukherjee,
Partha Pratim Das,
Moses Arunsingh,
Raj Kumar Shrimali,
Sriram Prasath,
Soumendranath Ray,
Sanjay Chatterjee
Abstract:
Prediction of survivability in a patient for tumor progression is useful to estimate the effectiveness of a treatment protocol. In our work, we present a model to take into account the heterogeneous nature of a tumor to predict survival. The tumor heterogeneity is measured in terms of its mass by combining information regarding the radiodensity obtained in images with the gross tumor volume (GTV).…
▽ More
Prediction of survivability in a patient for tumor progression is useful to estimate the effectiveness of a treatment protocol. In our work, we present a model to take into account the heterogeneous nature of a tumor to predict survival. The tumor heterogeneity is measured in terms of its mass by combining information regarding the radiodensity obtained in images with the gross tumor volume (GTV). We propose a novel feature called Tumor Mass within a GTV (TMG), that improves the prediction of survivability, compared to existing models which use GTV. Weekly variation in TMG of a patient is computed from the image data and also estimated from a cell survivability model. The parameters obtained from the cell survivability model are indicatives of changes in TMG over the treatment period. We use these parameters along with other patient metadata to perform survival analysis and regression. Cox's Proportional Hazard survival regression was performed using these data. Significant improvement in the average concordance index from 0.47 to 0.64 was observed when TMG is used in the model instead of GTV. The experiments show that there is a difference in the treatment response in responsive and non-responsive patients and that the proposed method can be used to predict patient survivability.
△ Less
Submitted 7 March, 2020;
originally announced March 2020.
-
Improving Efficiency in Large-Scale Decentralized Distributed Training
Authors:
Wei Zhang,
Xiaodong Cui,
Abdullah Kayi,
Mingrui Liu,
Ulrich Finkler,
Brian Kingsbury,
George Saon,
Youssef Mroueh,
Alper Buyuktosunoglu,
Payel Das,
David Kung,
Michael Picheny
Abstract:
Decentralized Parallel SGD (D-PSGD) and its asynchronous variant Asynchronous Parallel SGD (AD-PSGD) is a family of distributed learning algorithms that have been demonstrated to perform well for large-scale deep learning tasks. One drawback of (A)D-PSGD is that the spectral gap of the mixing matrix decreases when the number of learners in the system increases, which hampers convergence. In this p…
▽ More
Decentralized Parallel SGD (D-PSGD) and its asynchronous variant Asynchronous Parallel SGD (AD-PSGD) is a family of distributed learning algorithms that have been demonstrated to perform well for large-scale deep learning tasks. One drawback of (A)D-PSGD is that the spectral gap of the mixing matrix decreases when the number of learners in the system increases, which hampers convergence. In this paper, we investigate techniques to accelerate (A)D-PSGD based training by improving the spectral gap while minimizing the communication cost. We demonstrate the effectiveness of our proposed techniques by running experiments on the 2000-hour Switchboard speech recognition task and the ImageNet computer vision task. On an IBM P9 supercomputer, our system is able to train an LSTM acoustic model in 2.28 hours with 7.5% WER on the Hub5-2000 Switchboard (SWB) test set and 13.3% WER on the CallHome (CH) test set using 64 V100 GPUs and in 1.98 hours with 7.7% WER on SWB and 13.3% WER on CH using 128 V100 GPUs, the fastest training time reported to date.
△ Less
Submitted 3 February, 2020;
originally announced February 2020.
-
Design, Benchmarking and Explainability Analysis of a Game-Theoretic Framework towards Energy Efficiency in Smart Infrastructure
Authors:
Ioannis C. Konstantakopoulos,
Hari Prasanna Das,
Andrew R. Barkan,
Shiying He,
Tanya Veeravalli,
Huihan Liu,
Aummul Baneen Manasawala,
Yu-Wen Lin,
Costas J. Spanos
Abstract:
In this paper, we propose a gamification approach as a novel framework for smart building infrastructure with the goal of motivating human occupants to reconsider personal energy usage and to have positive effects on their environment. Human interaction in the context of cyber-physical systems is a core component and consideration in the implementation of any smart building technology. Research ha…
▽ More
In this paper, we propose a gamification approach as a novel framework for smart building infrastructure with the goal of motivating human occupants to reconsider personal energy usage and to have positive effects on their environment. Human interaction in the context of cyber-physical systems is a core component and consideration in the implementation of any smart building technology. Research has shown that the adoption of human-centric building services and amenities leads to improvements in the operational efficiency of these cyber-physical systems directed towards controlling building energy usage. We introduce a strategy in form of a game-theoretic framework that incorporates humans-in-the-loop modeling by creating an interface to allow building managers to interact with occupants and potentially incentivize energy efficient behavior. Prior works on game theoretic analysis typically rely on the assumption that the utility function of each individual agent is known a priori. Instead, we propose novel utility learning framework for benchmarking that employs robust estimations of occupant actions towards energy efficiency. To improve forecasting performance, we extend the utility learning scheme by leveraging deep bi-directional recurrent neural networks. Using the proposed methods on data gathered from occupant actions for resources such as room lighting, we forecast patterns of energy resource usage to demonstrate the prediction performance of the methods. The results of our study show that we can achieve a highly accurate representation of the ground truth for occupant energy resource usage. We also demonstrate the explainable nature on human decision making towards energy usage inherent in the dataset using graphical lasso and granger causality algorithms. Finally, we open source the de-identified, high-dimensional data pertaining to the energy game-theoretic framework.
△ Less
Submitted 16 October, 2019;
originally announced October 2019.
-
A Novel Graphical Lasso based approach towards Segmentation Analysis in Energy Game-Theoretic Frameworks
Authors:
Hari Prasanna Das,
Ioannis C. Konstantakopoulos,
Aummul Baneen Manasawala,
Tanya Veeravalli,
Huihan Liu,
Costas J. Spanos
Abstract:
Energy game-theoretic frameworks have emerged to be a successful strategy to encourage energy efficient behavior in large scale by leveraging human-in-the-loop strategy. A number of such frameworks have been introduced over the years which formulate the energy saving process as a competitive game with appropriate incentives for energy efficient players. However, prior works involve an incentive de…
▽ More
Energy game-theoretic frameworks have emerged to be a successful strategy to encourage energy efficient behavior in large scale by leveraging human-in-the-loop strategy. A number of such frameworks have been introduced over the years which formulate the energy saving process as a competitive game with appropriate incentives for energy efficient players. However, prior works involve an incentive design mechanism which is dependent on knowledge of utility functions for all the players in the game, which is hard to compute especially when the number of players is high, common in energy game-theoretic frameworks. Our research proposes that the utilities of players in such a framework can be grouped together to a relatively small number of clusters, and the clusters can then be targeted with tailored incentives. The key to above segmentation analysis is to learn the features leading to human decision making towards energy usage in competitive environments. We propose a novel graphical lasso based approach to perform such segmentation, by studying the feature correlations in a real-world energy social game dataset. To further improve the explainability of the model, we perform causality study using grangers causality. Proposed segmentation analysis results in characteristic clusters demonstrating different energy usage behaviors. We also present avenues to implement intelligent incentive design using proposed segmentation method.
△ Less
Submitted 5 October, 2019;
originally announced October 2019.
-
Estimating the Optimal Linear Combination of Biomarkers using Spherically Constrained Optimization
Authors:
Priyam Das,
Debsurya De,
Raju Maiti,
Mona Kamal,
Katherine A. Hutcheson,
Clifton D. Fuller,
Bibhas Chakraborty,
Christine B. Peterson
Abstract:
In the context of a binary classification problem, the optimal linear combination of continuous predictors can be estimated by maximizing an empirical estimate of the area under the receiver operating characteristic (ROC) curve (AUC). For multi-category responses, the optimal predictor combination can similarly be obtained by maximization of the empirical hypervolume under the manifold (HUM). This…
▽ More
In the context of a binary classification problem, the optimal linear combination of continuous predictors can be estimated by maximizing an empirical estimate of the area under the receiver operating characteristic (ROC) curve (AUC). For multi-category responses, the optimal predictor combination can similarly be obtained by maximization of the empirical hypervolume under the manifold (HUM). This problem is particularly relevant to medical research, where it may be of interest to diagnose a disease with various subtypes or predict a multi-category outcome. Since the empirical HUM is discontinuous, non-differentiable, and possibly multi-modal, solving this maximization problem requires a global optimization technique. Estimation of the optimal coefficient vector using existing global optimization techniques is computationally expensive, becoming prohibitive as the number of predictors and the number of outcome categories increases. We propose an efficient derivative-free black-box optimization technique based on pattern search to solve this problem. Through extensive simulation studies, we demonstrate that the proposed method achieves better performance compared to existing methods including the step-down algorithm. Finally, we illustrate the proposed method to predict swallowing difficulty after radiation therapy for oropharyngeal cancer based on radiation dose to various structures in the head and neck.
△ Less
Submitted 7 January, 2021; v1 submitted 6 September, 2019;
originally announced September 2019.
-
Likelihood Contribution based Multi-scale Architecture for Generative Flows
Authors:
Hari Prasanna Das,
Pieter Abbeel,
Costas J. Spanos
Abstract:
Deep generative modeling using flows has gained popularity owing to the tractable exact log-likelihood estimation with efficient training and synthesis process. However, flow models suffer from the challenge of having high dimensional latent space, the same in dimension as the input space. An effective solution to the above challenge as proposed by Dinh et al. (2016) is a multi-scale architecture,…
▽ More
Deep generative modeling using flows has gained popularity owing to the tractable exact log-likelihood estimation with efficient training and synthesis process. However, flow models suffer from the challenge of having high dimensional latent space, the same in dimension as the input space. An effective solution to the above challenge as proposed by Dinh et al. (2016) is a multi-scale architecture, which is based on iterative early factorization of a part of the total dimensions at regular intervals. Prior works on generative flow models involving a multi-scale architecture perform the dimension factorization based on static masking. We propose a novel multi-scale architecture that performs data-dependent factorization to decide which dimensions should pass through more flow layers. To facilitate the same, we introduce a heuristic based on the contribution of each dimension to the total log-likelihood which encodes the importance of the dimensions. Our proposed heuristic is readily obtained as part of the flow training process, enabling the versatile implementation of our likelihood contribution based multi-scale architecture for generic flow models. We present such implementations for several state-of-the-art flow models and demonstrate improvements in log-likelihood score and sampling quality on standard image benchmarks. We also conduct ablation studies to compare the proposed method with other options for dimension factorization.
△ Less
Submitted 27 January, 2022; v1 submitted 5 August, 2019;
originally announced August 2019.
-
Multitaper Spectral Analysis of Neuronal Spiking Activity Driven by Latent Stationary Processes
Authors:
Proloy Das,
Behtash Babadi
Abstract:
Investigating the spectral properties of the neural covariates that underlie spiking activity is an important problem in systems neuroscience, as it allows to study the role of brain rhythms in cognitive functions. While the spectral estimation of continuous time-series is a well-established domain, computing the spectral representation of these neural covariates from spiking data sets forth vario…
▽ More
Investigating the spectral properties of the neural covariates that underlie spiking activity is an important problem in systems neuroscience, as it allows to study the role of brain rhythms in cognitive functions. While the spectral estimation of continuous time-series is a well-established domain, computing the spectral representation of these neural covariates from spiking data sets forth various challenges due to the intrinsic non-linearities involved. In this paper, we address this problem by proposing a variant of the multitaper method specifically tailored for point process data. To this end, we construct auxiliary spiking statistics from which the eigen-spectra of the underlying latent process can be directly inferred using maximum likelihood estimation, and thereby the multitaper estimate can be efficiently computed. Comparison of our proposed technique to existing methods using simulated data reveals significant gains in terms of the bias-variance trade-off.
△ Less
Submitted 20 June, 2019;
originally announced June 2019.
-
A distribution-free smoothed combination method of biomarkers to improve diagnostic accuracy in multi-category classification
Authors:
Raju Maiti,
Jialiang Li,
Priyam Das,
Lei Feng,
Derek Hausenloy,
Bibhas Chakraborty
Abstract:
Results from multiple diagnostic tests are usually combined to improve the overall diagnostic accuracy. For binary classification, maximization of the empirical estimate of the area under the receiver operating characteristic (ROC) curve is widely adopted to produce the optimal linear combination of multiple biomarkers. In the presence of large number of biomarkers, this method proves to be comput…
▽ More
Results from multiple diagnostic tests are usually combined to improve the overall diagnostic accuracy. For binary classification, maximization of the empirical estimate of the area under the receiver operating characteristic (ROC) curve is widely adopted to produce the optimal linear combination of multiple biomarkers. In the presence of large number of biomarkers, this method proves to be computationally expensive and difficult to implement since it involves maximization of a discontinuous, non-smooth function for which gradient-based methods cannot be used directly. Complexity of this problem increases when the classification problem becomes multi-category. In this article, we develop a linear combination method that maximizes a smooth approximation of the empirical Hypervolume Under Manifolds (HUM) for multi-category outcome. We approximate HUM by replacing the indicator function with the sigmoid function or normal cumulative distribution function (CDF). With the above smooth approximations, efficient gradient-based algorithms can be employed to obtain better solution with less computing time. We show that under some regularity conditions, the proposed method yields consistent estimates of the coefficient parameters. We also derive the asymptotic normality of the coefficient estimates. We conduct extensive simulations to examine our methods. Under different simulation scenarios, the proposed methods are compared with other existing methods and are shown to outperform them in terms of diagnostic accuracy. The proposed method is illustrated using two real medical data sets.
△ Less
Submitted 22 April, 2019;
originally announced April 2019.
-
NExUS: Bayesian simultaneous network estimation across unequal sample sizes
Authors:
Priyam Das,
Christine Peterson,
Kim-Anh Do,
Rehan Akbani,
Veerabhadran Baladandayuthapani
Abstract:
Network-based analyses of high-throughput genomics data provide a holistic, systems-level understanding of various biological mechanisms for a common population. However, when estimating multiple networks across heterogeneous sub-populations, varying sample sizes pose a challenge in the estimation and inference, as network differences may be driven by differences in power. We are particularly inte…
▽ More
Network-based analyses of high-throughput genomics data provide a holistic, systems-level understanding of various biological mechanisms for a common population. However, when estimating multiple networks across heterogeneous sub-populations, varying sample sizes pose a challenge in the estimation and inference, as network differences may be driven by differences in power. We are particularly interested in addressing this challenge in the context of proteomic networks for related cancers, as the number of subjects available for rare cancer (sub-)types is often limited. We develop NExUS (Network Estimation across Unequal Sample sizes), a Bayesian method that enables joint learning of multiple networks while avoiding artefactual relationship between sample size and network sparsity. We demonstrate through simulations that NExUS outperforms existing network estimation methods in this context, and apply it to learn network similarity and shared pathway activity for groups of cancers with related origins represented in The Cancer Genome Atlas (TCGA) proteomic data.
△ Less
Submitted 6 November, 2018;
originally announced November 2018.
-
Segmentation Analysis in Human Centric Cyber-Physical Systems using Graphical Lasso
Authors:
Hari Prasanna Das,
Ioannis C. Konstantakopoulos,
Aummul Baneen Manasawala,
Tanya Veeravalli,
Huihan Liu,
Costas J. Spanos
Abstract:
A generalized gamification framework is introduced as a form of smart infrastructure with potential to improve sustainability and energy efficiency by leveraging humans-in-the-loop strategy. The proposed framework enables a Human-Centric Cyber-Physical System using an interface to allow building managers to interact with occupants. The interface is designed for occupant engagement-integration supp…
▽ More
A generalized gamification framework is introduced as a form of smart infrastructure with potential to improve sustainability and energy efficiency by leveraging humans-in-the-loop strategy. The proposed framework enables a Human-Centric Cyber-Physical System using an interface to allow building managers to interact with occupants. The interface is designed for occupant engagement-integration supporting learning of their preferences over resources in addition to understanding how preferences change as a function of external stimuli such as physical control, time or incentives. Towards intelligent and autonomous incentive design, a noble statistical learning algorithm performing occupants energy usage behavior segmentation is proposed. We apply the proposed algorithm, Graphical Lasso, on energy resource usage data by the occupants to obtain feature correlations--dependencies. Segmentation analysis results in characteristic clusters demonstrating different energy usage behaviors. The features--factors characterizing human decision-making are made explainable.
△ Less
Submitted 16 January, 2019; v1 submitted 24 October, 2018;
originally announced October 2018.
-
PepCVAE: Semi-Supervised Targeted Design of Antimicrobial Peptide Sequences
Authors:
Payel Das,
Kahini Wadhawan,
Oscar Chang,
Tom Sercu,
Cicero Dos Santos,
Matthew Riemer,
Vijil Chenthamarakshan,
Inkit Padhi,
Aleksandra Mojsilovic
Abstract:
Given the emerging global threat of antimicrobial resistance, new methods for next-generation antimicrobial design are urgently needed. We report a peptide generation framework PepCVAE, based on a semi-supervised variational autoencoder (VAE) model, for designing novel antimicrobial peptide (AMP) sequences. Our model learns a rich latent space of the biological peptide context by taking advantage…
▽ More
Given the emerging global threat of antimicrobial resistance, new methods for next-generation antimicrobial design are urgently needed. We report a peptide generation framework PepCVAE, based on a semi-supervised variational autoencoder (VAE) model, for designing novel antimicrobial peptide (AMP) sequences. Our model learns a rich latent space of the biological peptide context by taking advantage of abundant, unlabeled peptide sequences. The model further learns a disentangled antimicrobial attribute space by using the feedback from a jointly trained AMP classifier that uses limited labeled instances. The disentangled representation allows for controllable generation of AMPs. Extensive analysis of the PepCVAE-generated sequences reveals superior performance of our model in comparison to a plain VAE, as PepCVAE generates novel AMP sequences with higher long-range diversity, while being closer to the training distribution of biological peptides. These features are highly desired in next-generation antimicrobial design.
△ Less
Submitted 13 November, 2018; v1 submitted 17 October, 2018;
originally announced October 2018.
-
Autism Classification Using Brain Functional Connectivity Dynamics and Machine Learning
Authors:
Ravi Tejwani,
Adam Liska,
Hongyuan You,
Jenna Reinen,
Payel Das
Abstract:
The goal of the present study is to identify autism using machine learning techniques and resting-state brain imaging data, leveraging the temporal variability of the functional connections (FC) as the only information. We estimated and compared the FC variability across brain regions between typical, healthy subjects and autistic population by analyzing brain imaging data from a world-wide multi-…
▽ More
The goal of the present study is to identify autism using machine learning techniques and resting-state brain imaging data, leveraging the temporal variability of the functional connections (FC) as the only information. We estimated and compared the FC variability across brain regions between typical, healthy subjects and autistic population by analyzing brain imaging data from a world-wide multi-site database known as ABIDE (Autism Brain Imaging Data Exchange). Our analysis revealed that patients diagnosed with autism spectrum disorder (ASD) show increased FC variability in several brain regions that are associated with low FC variability in the typical brain. We then used the enhanced FC variability of brain regions as features for training machine learning models for ASD classification and achieved 65% accuracy in identification of ASD versus control subjects within the dataset. We also used node strength estimated from number of functional connections per node averaged over the whole scan as features for ASD classification.The results reveal that the dynamic FC measures outperform or are comparable with the static FC measures in predicting ASD.
△ Less
Submitted 21 December, 2017;
originally announced December 2017.
-
Neurology-as-a-Service for the Developing World
Authors:
Tejas Dharamsi,
Payel Das,
Tejaswini Pedapati,
Gregory Bramble,
Vinod Muthusamy,
Horst Samulowitz,
Kush R. Varshney,
Yuvaraj Rajamanickam,
John Thomas,
Justin Dauwels
Abstract:
Electroencephalography (EEG) is an extensively-used and well-studied technique in the field of medical diagnostics and treatment for brain disorders, including epilepsy, migraines, and tumors. The analysis and interpretation of EEGs require physicians to have specialized training, which is not common even among most doctors in the developed world, let alone the developing world where physician sho…
▽ More
Electroencephalography (EEG) is an extensively-used and well-studied technique in the field of medical diagnostics and treatment for brain disorders, including epilepsy, migraines, and tumors. The analysis and interpretation of EEGs require physicians to have specialized training, which is not common even among most doctors in the developed world, let alone the developing world where physician shortages plague society. This problem can be addressed by teleEEG that uses remote EEG analysis by experts or by local computer processing of EEGs. However, both of these options are prohibitively expensive and the second option requires abundant computing resources and infrastructure, which is another concern in developing countries where there are resource constraints on capital and computing infrastructure. In this work, we present a cloud-based deep neural network approach to provide decision support for non-specialist physicians in EEG analysis and interpretation. Named `neurology-as-a-service,' the approach requires almost no manual intervention in feature engineering and in the selection of an optimal architecture and hyperparameters of the neural network. In this study, we deploy a pipeline that includes moving EEG data to the cloud and getting optimal models for various classification tasks. Our initial prototype has been tested only in developed world environments to-date, but our intention is to test it in developing world environments in future work. We demonstrate the performance of our proposed approach using the BCI2000 EEG MMI dataset, on which our service attains 63.4% accuracy for the task of classifying real vs. imaginary activity performed by the subject, which is significantly higher than what is obtained with a shallow approach such as support vector machines.
△ Less
Submitted 21 November, 2017; v1 submitted 16 November, 2017;
originally announced November 2017.
-
Analysis of Deformation Fields in Spatio-temporal CBCT images of lungs for radiotherapy patients
Authors:
Bijju Kranthi Veduruparthi,
Jayanta Mukherjee,
Partha Pratim Das,
Mandira Saha,
Raj Kumar Shrimali,
Sanjoy Chatterjee,
Soumendranath Ray,
Sriram Prasath
Abstract:
Deformable registration of spatiotemporal Cone-Beam Computed Tomography (CBCT) images taken sequentially during the radiation treatment course yields a deformation field for a pair of images. The Jacobian of this field at any voxel provides a measure of the expansion or contraction of a unit volume. We analyze the Jacobian at different sections of the tumor volumes obtained from delineation done b…
▽ More
Deformable registration of spatiotemporal Cone-Beam Computed Tomography (CBCT) images taken sequentially during the radiation treatment course yields a deformation field for a pair of images. The Jacobian of this field at any voxel provides a measure of the expansion or contraction of a unit volume. We analyze the Jacobian at different sections of the tumor volumes obtained from delineation done by radiation oncologists for lung cancer patients. The delineations across the temporal sequence are compared post registration to compute tumor areas namely, unchanged (U), newly grown (G), and reduced (R) that have undergone changes. These three regions of the tumor are considered for statistical analysis. In addition, statistics of non-tumor (N) regions are taken into consideration. Sequential CBCT images of 29 patients were used in studying the distribution of Jacobian in these four different regions, along with a test set of 16 patients. Statistical tests performed over the dataset consisting of first three weeks of treatment suggest that, means of the Jacobian in the regions follow a particular order. Although, this observation is apparent when applied to the distribution over the whole population, it is found that the ordering deviates for many individual cases. We propose a hypothesis to classify patients who have had partial response (PR). Early prediction of the response was studied using only three weeks of data. The early prediction of response of treatment was supported by a Fisher's test with odds ratio of 5.13 and a p-value of 0.043.
△ Less
Submitted 27 July, 2017;
originally announced July 2017.
-
Bayesian Non-parametric Simultaneous Quantile Regression for Complete and Grid Data
Authors:
Priyam Das,
Subhashis Ghosal
Abstract:
In this paper, we consider Bayesian methods for non-parametric quantile regressions with multiple continuous predictors ranging values in the unit interval. In the first method, the quantile function is assumed to be smooth over the explanatory variable and is expanded in tensor product of B-spline basis functions. While in the second method, the distribution function is assumed to be smooth over…
▽ More
In this paper, we consider Bayesian methods for non-parametric quantile regressions with multiple continuous predictors ranging values in the unit interval. In the first method, the quantile function is assumed to be smooth over the explanatory variable and is expanded in tensor product of B-spline basis functions. While in the second method, the distribution function is assumed to be smooth over the explanatory variable and is expanded in tensor product of B-spline basis functions. Unlike other existing methods of non-parametric quantile regressions, the proposed methods estimate the whole quantile function instead of estimating on a grid of quantiles. Priors on the B-spline coefficients are put in such a way that the monotonicity of the estimated quantile levels are maintained unlike local polynomial quantile regression methods. The proposed methods have also been modified for quantile grid data where only the percentile range of each response observations are known. Simulations studies have been provided for both complete and quantile grid data. The proposed method has been used to estimate the quantiles of US household income data and North Atlantic hurricane intensity data.
△ Less
Submitted 30 November, 2016;
originally announced December 2016.
-
Understanding Sea Ice Melting via Functional Data Analysis
Authors:
Purba Das,
Ananya Lahiri,
Sourish Das
Abstract:
In this article, we considered the problem of sea ice cover is melting. Considering the `satellite passive microwave remote sensing data' as functional data, we studied daily observation of sea ice cover of each year as a smooth continuous function of time. We investigated the mean function for the sea ice area for following decades and computed the corresponding $95\%$ bootstrap confidence interv…
▽ More
In this article, we considered the problem of sea ice cover is melting. Considering the `satellite passive microwave remote sensing data' as functional data, we studied daily observation of sea ice cover of each year as a smooth continuous function of time. We investigated the mean function for the sea ice area for following decades and computed the corresponding $95\%$ bootstrap confidence interval for the both Arctic and Antarctic Oceans. We found the mean function for the sea ice area dropped statistically significantly in recent decades for the Arctic Ocean. However, no such statistical evidence was found for the Antarctic ocean. Essentially, the mean function for sea ice area in the Antarctic Ocean is unchanged. Additional evidence of the melting of sea ice area in the Arctic Ocean is provided by three types of phase curve (namely, Area vs. Velocity, Area vs. Acceleration, and Velocity Vs. Acceleration). In the Arctic Ocean, during the summer, the current decades is observing the size of the sea ice area about $30\%$ less, than what it used to be during the first decade. In this article, we have taken a distribution-free approach for our analysis, except the data generating process, belongs to the Hilbert space.
△ Less
Submitted 22 October, 2016;
originally announced October 2016.
-
Analyzing Ozone Concentration by Bayesian Spatio-temporal Quantile Regression
Authors:
Priyam Das,
Subhashis Ghosal
Abstract:
Ground level Ozone is one of the six common air-pollutants on which the EPA has set national air quality standards. In order to capture the spatio-temporal trend of 1-hour and 8-hour average ozone concentration in the US, we develop a method for spatio-temporal simultaneous quantile regression. Unlike existing procedures, in the proposed method, smoothing across the sites is incorporated within mo…
▽ More
Ground level Ozone is one of the six common air-pollutants on which the EPA has set national air quality standards. In order to capture the spatio-temporal trend of 1-hour and 8-hour average ozone concentration in the US, we develop a method for spatio-temporal simultaneous quantile regression. Unlike existing procedures, in the proposed method, smoothing across the sites is incorporated within modeling assumptions thus allowing borrowing of information across locations, an essential step when the number of samples in each location is low. The quantile function has been assumed to be linear in time and smooth over space and at any given site is given by a convex combination of two monotone increasing functions $ξ_1$ and $ξ_2$ not depending on time. A B-spline basis expansion with increasing coefficients varying smoothly over the space is used to put a prior and a Bayesian analysis is performed. We analyze the average daily 1-hour maximum and 8-hour maximum ozone concentration level data of US and California during 2006-2015 using the proposed method. It is noted that in the last ten years, there is an overall decreasing trend in both 1-hour maximum and 8-hour maximum ozone concentration level over the most parts of the US. In California, an overall a decreasing trend of 1-hour maximum ozone level is observed while no particular overall trend has been observed in the case of 8-hour maximum ozone level.
△ Less
Submitted 5 December, 2016; v1 submitted 15 September, 2016;
originally announced September 2016.
-
Bayesian Quantile Regression Using Random B-spline Series Prior
Authors:
Priyam Das,
Subhashis Ghoshal
Abstract:
We consider a Bayesian method for simultaneous quantile regression on a real variable. By monotone transformation, we can make both the response variable and the predictor variable take values in the unit interval. A representation of quantile function is given by a convex combination of two monotone increasing functions $ξ_1$ and $ξ_2$ not depending on the prediction variables. In a Bayesian appr…
▽ More
We consider a Bayesian method for simultaneous quantile regression on a real variable. By monotone transformation, we can make both the response variable and the predictor variable take values in the unit interval. A representation of quantile function is given by a convex combination of two monotone increasing functions $ξ_1$ and $ξ_2$ not depending on the prediction variables. In a Bayesian approach, a prior is put on quantile functions by putting prior distributions on $ξ_1$ and $ξ_2$. The monotonicity constraint on the curves $ξ_1$ and $ξ_2$ are obtained through a spline basis expansion with coefficients increasing and lying in the unit interval. We put a Dirichlet prior distribution on the spacings of the coefficient vector. A finite random series based on splines obeys the shape restrictions. We compare our approach with a Bayesian method using Gaussian process prior through an extensive simulation study and some other Bayesian approaches proposed in the literature. An application to a data on hurricane activities in the Atlantic region is given. We also apply our method on region-wise population data of USA for the period 1985--2010.
△ Less
Submitted 9 September, 2016;
originally announced September 2016.
-
Recursive Modified Pattern Search on High-dimensional Simplex : A Blackbox Optimization Technique
Authors:
Priyam Das
Abstract:
In this paper, a novel derivative-free pattern search based algorithm for Black-box optimization is proposed over a simplex constrained parameter space. At each iteration, starting from the current solution, new possible set of solutions are found by adding a set of derived step-size vectors to the initial starting point. While deriving these step-size vectors, precautions and adjustments are cons…
▽ More
In this paper, a novel derivative-free pattern search based algorithm for Black-box optimization is proposed over a simplex constrained parameter space. At each iteration, starting from the current solution, new possible set of solutions are found by adding a set of derived step-size vectors to the initial starting point. While deriving these step-size vectors, precautions and adjustments are considered so that the set of new possible solution points still remain within the simplex constrained space. Thus, no extra time is spent in evaluating the (possibly expensive) objective function at infeasible points (points outside the unit-simplex space). While minimizing any objective function of m parameters, within each iteration, the objective function is evaluated at 2m new possible solution points. So, upto 2m parallel threads can be incorporated which makes the computation even faster while optimizing expensive objective functions over high-dimensional parameter space. Once a local minimum is discovered, in order to find a better solution, a novel `re-start' strategy is considered to increase the likelihood of finding a better solution. Unlike existing pattern search based methods, a sparsity control parameter is introduced which can be used to induce sparsity in the solution in case the solution is expected to be sparse in prior. A comparative study of the performances of the proposed algorithm and other existing algorithms are shown for a few low, moderate and high-dimensional optimization problems. Upto 338 folds improvement in computation time is achieved using the proposed algorithm over Genetic algorithm along with better solution. The proposed algorithm is used to estimate the simultaneous quantiles of North Atlantic Hurricane velocities during 1981-2006 by maximizing a non-closed form likelihood function with (possibly) multiple maximums.
△ Less
Submitted 30 January, 2019; v1 submitted 28 April, 2016;
originally announced April 2016.
-
Black-box optimization on hyper-rectangle using Recursive Modified Pattern Search and application to ROC-based Classification Problem
Authors:
Priyam Das
Abstract:
In statistics, it is common to encounter multi-modal and non-smooth likelihood (or objective function) maximization problems, where the parameters have known upper and lower bounds. This paper proposes a novel derivative-free global optimization technique that can be used to solve those problems even when the objective function is not known explicitly or its derivatives are difficult or expensive…
▽ More
In statistics, it is common to encounter multi-modal and non-smooth likelihood (or objective function) maximization problems, where the parameters have known upper and lower bounds. This paper proposes a novel derivative-free global optimization technique that can be used to solve those problems even when the objective function is not known explicitly or its derivatives are difficult or expensive to obtain. The technique is based on the pattern search algorithm, which has been shown to be effective for black-box optimization problems. The proposed algorithm works by iteratively generating new solutions from the current solution. The new solutions are generated by making movements along the coordinate axes of the constrained sample space. Before making a jump from the current solution to a new solution, the objective function is evaluated at several neighborhood points around the current solution. The best solution point is then chosen based on the objective function values at those points. Parallel threading can be used to make the algorithm more scalable. The performance of the proposed method is evaluated by optimizing up to 5000-dimensional multi-modal benchmark functions. The proposed algorithm is shown to be up to 40 and 368 times faster than genetic algorithm (GA) and simulated annealing (SA), respectively. The proposed method is also used to estimate the optimal biomarker combination from Alzheimer's disease data by maximizing the empirical estimates of the area under the receiver operating characteristic curve (AUC), outperforming the contextual popular alternative, known as step-down algorithm.
△ Less
Submitted 12 September, 2023; v1 submitted 28 April, 2016;
originally announced April 2016.
-
Multi-task Sparse Structure Learning
Authors:
Andre R. Goncalves,
Puja Das,
Soumyadeep Chatterjee,
Vidyashankar Sivakumar,
Fernando J. Von Zuben,
Arindam Banerjee
Abstract:
Multi-task learning (MTL) aims to improve generalization performance by learning multiple related tasks simultaneously. While sometimes the underlying task relationship structure is known, often the structure needs to be estimated from data at hand. In this paper, we present a novel family of models for MTL, applicable to regression and classification problems, capable of learning the structure of…
▽ More
Multi-task learning (MTL) aims to improve generalization performance by learning multiple related tasks simultaneously. While sometimes the underlying task relationship structure is known, often the structure needs to be estimated from data at hand. In this paper, we present a novel family of models for MTL, applicable to regression and classification problems, capable of learning the structure of task relationships. In particular, we consider a joint estimation problem of the task relationship structure and the individual task parameters, which is solved using alternating minimization. The task relationship structure learning component builds on recent advances in structure learning of Gaussian graphical models based on sparse estimators of the precision (inverse covariance) matrix. We illustrate the effectiveness of the proposed model on a variety of synthetic and benchmark datasets for regression and classification. We also consider the problem of combining climate model outputs for better projections of future climate, with focus on temperature in South America, and show that the proposed model outperforms several existing methods for the problem.
△ Less
Submitted 1 September, 2014; v1 submitted 31 August, 2014;
originally announced September 2014.