-
Gradient-bridged Posterior: Bayesian Inference for Models with Implicit Functions
Authors:
Cheng Zeng,
Yaozhi Yang,
Jason Xu,
Leo L Duan
Abstract:
Many statistical problems include model parameters that are defined as the solutions to optimization sub-problems. These include classical approaches such as profile likelihood as well as modern applications involving flow networks or Procrustes distances. In such cases, the likelihood of the data involves an implicit function, often complicating inferential procedures and entailing prohibitive co…
▽ More
Many statistical problems include model parameters that are defined as the solutions to optimization sub-problems. These include classical approaches such as profile likelihood as well as modern applications involving flow networks or Procrustes distances. In such cases, the likelihood of the data involves an implicit function, often complicating inferential procedures and entailing prohibitive computational cost. In this article, we propose an intuitive and tractable posterior inference approach for this setting. We introduce a class of continuous models that handle implicit function values using the first-order optimality of the sub-problems. Specifically, we apply a shrinkage kernel to the gradient norm, which retains a probabilistic interpretation within a generative model. This can be understood as a generalization of the Gibbs posterior framework to newly enable concentration around partial minimizers in a subset of the parameters. We show that this method, termed the gradient-bridged posterior, is amenable to efficient posterior computation, and enjoys theoretical guarantees, establishing a Bernstein--von Mises theorem for asymptotic normality. The advantages of our approach are highlighted on a synthetic flow network experiment and an application to data integration using Procrustes distances.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
How Much Can Time-related Features Enhance Time Series Forecasting?
Authors:
Chaolv Zeng,
Yuan Tian,
Guanjie Zheng,
Yunjun Gao
Abstract:
Recent advancements in long-term time series forecasting (LTSF) have primarily focused on capturing cross-time and cross-variate (channel) dependencies within historical data. However, a critical aspect often overlooked by many existing methods is the explicit incorporation of \textbf{time-related features} (e.g., season, month, day of the week, hour, minute), which are essential components of tim…
▽ More
Recent advancements in long-term time series forecasting (LTSF) have primarily focused on capturing cross-time and cross-variate (channel) dependencies within historical data. However, a critical aspect often overlooked by many existing methods is the explicit incorporation of \textbf{time-related features} (e.g., season, month, day of the week, hour, minute), which are essential components of time series data. The absence of this explicit time-related encoding limits the ability of current models to capture cyclical or seasonal trends and long-term dependencies, especially with limited historical input. To address this gap, we introduce a simple yet highly efficient module designed to encode time-related features, Time Stamp Forecaster (TimeSter), thereby enhancing the backbone's forecasting performance. By integrating TimeSter with a linear backbone, our model, TimeLinear, significantly improves the performance of a single linear projector, reducing MSE by an average of 23\% on benchmark datasets such as Electricity and Traffic. Notably, TimeLinear achieves these gains while maintaining exceptional computational efficiency, delivering results that are on par with or exceed state-of-the-art models, despite using a fraction of the parameters.
△ Less
Submitted 2 December, 2024;
originally announced December 2024.
-
The Bridged Posterior: Optimization, Profile Likelihood and a New Approach to Generalized Bayes
Authors:
Cheng Zeng,
Eleni Dilma,
Jason Xu,
Leo L Duan
Abstract:
Optimization is widely used in statistics, thanks to its efficiency for delivering point estimates on useful spaces, such as those satisfying low cardinality or combinatorial structure. To quantify uncertainty, Gibbs posterior exponentiates the negative loss function to form a posterior density. Nevertheless, Gibbs posteriors are supported in a high-dimensional space, and do not inherit the comput…
▽ More
Optimization is widely used in statistics, thanks to its efficiency for delivering point estimates on useful spaces, such as those satisfying low cardinality or combinatorial structure. To quantify uncertainty, Gibbs posterior exponentiates the negative loss function to form a posterior density. Nevertheless, Gibbs posteriors are supported in a high-dimensional space, and do not inherit the computational efficiency or constraint formulations from optimization. In this article, we explore a new generalized Bayes approach, viewing the likelihood as a function of data, parameters, and latent variables conditionally determined by an optimization sub-problem. Marginally, the latent variable given the data remains stochastic, and is characterized by its posterior distribution. This framework, coined ``bridged posterior'', conforms to the Bayesian paradigm. Besides providing a novel generative model, we obtain a positively surprising theoretical finding that under mild conditions, the $\sqrt{n}$-adjusted posterior distribution of the parameters under our model converges to the same normal distribution as that of the canonical integrated posterior. Therefore, our result formally dispels a long-held belief that partial optimization of latent variables may lead to under-estimation of parameter uncertainty. We demonstrate the practical advantages of our approach under several settings, including maximum-margin classification, latent normal models, and harmonization of multiple networks.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
Analyze the Robustness of Classifiers under Label Noise
Authors:
Cheng Zeng,
Yixuan Xu,
Jiaqi Tian
Abstract:
This study explores the robustness of label noise classifiers, aiming to enhance model resilience against noisy data in complex real-world scenarios. Label noise in supervised learning, characterized by erroneous or imprecise labels, significantly impairs model performance. This research focuses on the increasingly pertinent issue of label noise's impact on practical applications. Addressing the p…
▽ More
This study explores the robustness of label noise classifiers, aiming to enhance model resilience against noisy data in complex real-world scenarios. Label noise in supervised learning, characterized by erroneous or imprecise labels, significantly impairs model performance. This research focuses on the increasingly pertinent issue of label noise's impact on practical applications. Addressing the prevalent challenge of inaccurate training data labels, we integrate adversarial machine learning (AML) and importance reweighting techniques. Our approach involves employing convolutional neural networks (CNN) as the foundational model, with an emphasis on parameter adjustment for individual training samples. This strategy is designed to heighten the model's focus on samples critically influencing performance.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Solving PDEs on Spheres with Physics-Informed Convolutional Neural Networks
Authors:
Guanhang Lei,
Zhen Lei,
Lei Shi,
Chenyu Zeng,
Ding-Xuan Zhou
Abstract:
Physics-informed neural networks (PINNs) have been demonstrated to be efficient in solving partial differential equations (PDEs) from a variety of experimental perspectives. Some recent studies have also proposed PINN algorithms for PDEs on surfaces, including spheres. However, theoretical understanding of the numerical performance of PINNs, especially PINNs on surfaces or manifolds, is still lack…
▽ More
Physics-informed neural networks (PINNs) have been demonstrated to be efficient in solving partial differential equations (PDEs) from a variety of experimental perspectives. Some recent studies have also proposed PINN algorithms for PDEs on surfaces, including spheres. However, theoretical understanding of the numerical performance of PINNs, especially PINNs on surfaces or manifolds, is still lacking. In this paper, we establish rigorous analysis of the physics-informed convolutional neural network (PICNN) for solving PDEs on the sphere. By using and improving the latest approximation results of deep convolutional neural networks and spherical harmonic analysis, we prove an upper bound for the approximation error with respect to the Sobolev norm. Subsequently, we integrate this with innovative localization complexity analysis to establish fast convergence rates for PICNN. Our theoretical results are also confirmed and supplemented by our experiments. In light of these findings, we explore potential strategies for circumventing the curse of dimensionality that arises when solving high-dimensional PDEs.
△ Less
Submitted 5 August, 2024; v1 submitted 18 August, 2023;
originally announced August 2023.
-
On Learning Prediction-Focused Mixtures
Authors:
Abhishek Sharma,
Catherine Zeng,
Sanjana Narayanan,
Sonali Parbhoo,
Finale Doshi-Velez
Abstract:
Probabilistic models help us encode latent structures that both model the data and are ideally also useful for specific downstream tasks. Among these, mixture models and their time-series counterparts, hidden Markov models, identify discrete components in the data. In this work, we focus on a constrained capacity setting, where we want to learn a model with relatively few components (e.g. for inte…
▽ More
Probabilistic models help us encode latent structures that both model the data and are ideally also useful for specific downstream tasks. Among these, mixture models and their time-series counterparts, hidden Markov models, identify discrete components in the data. In this work, we focus on a constrained capacity setting, where we want to learn a model with relatively few components (e.g. for interpretability purposes). To maintain prediction performance, we introduce prediction-focused modeling for mixtures, which automatically selects the dimensions relevant to the prediction task. Our approach identifies relevant signal from the input, outperforms models that are not prediction-focused, and is easy to optimize; we also characterize when prediction-focused modeling can be expected to work.
△ Less
Submitted 27 October, 2021; v1 submitted 25 October, 2021;
originally announced October 2021.
-
Bivariate Hierarchical Bayesian Model for Combining Summary Measures and their Uncertainties from Multiple Sources
Authors:
Yujing Yao,
R. Todd Ogden,
Chubing Zeng,
Qixuan Chen
Abstract:
It is often of interest to combine available estimates of a similar quantity from multiple data sources. When the corresponding variances of each estimate are also available, a model should take into account the uncertainty of the estimates themselves as well as the uncertainty in the estimation of variances. In addition, if there exists a strong association between estimates and their variances,…
▽ More
It is often of interest to combine available estimates of a similar quantity from multiple data sources. When the corresponding variances of each estimate are also available, a model should take into account the uncertainty of the estimates themselves as well as the uncertainty in the estimation of variances. In addition, if there exists a strong association between estimates and their variances, the correlation between these two quantities should also be considered. In this paper, we propose a bivariate hierarchical Bayesian model that jointly models the estimates and their estimated variances assuming a correlation between these two measures. We conduct simulations to explore the performance of the proposed bivariate Bayesian model and compare it to other commonly used methods under different correlation scenarios. The proposed bivariate Bayesian model has a wide range of applications. We illustrate its application in three very different areas: PET brain imaging studies, meta-analysis, and small area estimation.
△ Less
Submitted 15 September, 2021;
originally announced September 2021.
-
Consistent Model-based Clustering: using the Quasi-Bernoulli Stick-Breaking Process
Authors:
Cheng Zeng,
Jeffrey W. Miller,
Leo L. Duan
Abstract:
In mixture modeling and clustering applications, the number of components and clusters is often not known. A stick-breaking mixture model, such as the Dirichlet process mixture model, is an appealing construction that assumes infinitely many components, while shrinking the weights of most of the unused components to near zero. However, it is well-known that this shrinkage is inadequate: even when…
▽ More
In mixture modeling and clustering applications, the number of components and clusters is often not known. A stick-breaking mixture model, such as the Dirichlet process mixture model, is an appealing construction that assumes infinitely many components, while shrinking the weights of most of the unused components to near zero. However, it is well-known that this shrinkage is inadequate: even when the component distribution is correctly specified, spurious weights appear and give an inconsistent estimate of the number of clusters. In this article, we propose a simple solution: when breaking each mixture weight stick into two pieces, the length of the second piece is multiplied by a quasi-Bernoulli random variable, taking value one or a small constant close to zero. This effectively creates a soft-truncation and further shrinks the unused weights. Asymptotically, we show that as long as this small constant diminishes to zero at a rate faster than $o(1/n^2)$, with $n$ the sample size, the posterior distribution will converge to the true number of clusters. In comparison, we rigorously explore Dirichlet process mixture models using a concentration parameter that is either constant or rapidly diminishes to zero -- both of which lead to inconsistency for the number of clusters. Our proposed model is easy to implement, requiring only a small modification of a standard Gibbs sampler for mixture models. In simulations and a data application of clustering brain networks, our proposed method recovers the ground-truth number of clusters, and leads to a small number of clusters.
△ Less
Submitted 21 February, 2023; v1 submitted 22 August, 2020;
originally announced August 2020.
-
From predictions to prescriptions: A data-driven response to COVID-19
Authors:
Dimitris Bertsimas,
Léonard Boussioux,
Ryan Cory Wright,
Arthur Delarue,
Vassilis Digalakis Jr.,
Alexandre Jacquillat,
Driss Lahlou Kitane,
Galit Lukin,
Michael Lingzhi Li,
Luca Mingardi,
Omid Nohadani,
Agni Orfanoudaki,
Theodore Papalexopoulos,
Ivan Paskov,
Jean Pauphilet,
Omar Skali Lami,
Bartolomeo Stellato,
Hamza Tazi Bouardi,
Kimberly Villalobos Carballo,
Holly Wiberg,
Cynthia Zeng
Abstract:
The COVID-19 pandemic has created unprecedented challenges worldwide. Strained healthcare providers make difficult decisions on patient triage, treatment and care management on a daily basis. Policy makers have imposed social distancing measures to slow the disease, at a steep economic price. We design analytical tools to support these decisions and combat the pandemic. Specifically, we propose a…
▽ More
The COVID-19 pandemic has created unprecedented challenges worldwide. Strained healthcare providers make difficult decisions on patient triage, treatment and care management on a daily basis. Policy makers have imposed social distancing measures to slow the disease, at a steep economic price. We design analytical tools to support these decisions and combat the pandemic. Specifically, we propose a comprehensive data-driven approach to understand the clinical characteristics of COVID-19, predict its mortality, forecast its evolution, and ultimately alleviate its impact. By leveraging cohort-level clinical data, patient-level hospital data, and census-level epidemiological data, we develop an integrated four-step approach, combining descriptive, predictive and prescriptive analytics. First, we aggregate hundreds of clinical studies into the most comprehensive database on COVID-19 to paint a new macroscopic picture of the disease. Second, we build personalized calculators to predict the risk of infection and mortality as a function of demographics, symptoms, comorbidities, and lab values. Third, we develop a novel epidemiological model to project the pandemic's spread and inform social distancing policies. Fourth, we propose an optimization model to re-allocate ventilators and alleviate shortages. Our results have been used at the clinical level by several hospitals to triage patients, guide care management, plan ICU capacity, and re-distribute ventilators. At the policy level, they are currently supporting safe back-to-work policies at a major institution and equitable vaccine distribution planning at a major pharmaceutical company, and have been integrated into the US Center for Disease Control's pandemic forecast.
△ Less
Submitted 29 June, 2020;
originally announced June 2020.
-
Many could be better than all: A novel instance-oriented algorithm for Multi-modal Multi-label problem
Authors:
Yi Zhang,
Cheng Zeng,
Hao Cheng,
Chongjun Wang,
Lei Zhang
Abstract:
With the emergence of diverse data collection techniques, objects in real applications can be represented as multi-modal features. What's more, objects may have multiple semantic meanings. Multi-modal and Multi-label (MMML) problem becomes a universal phenomenon. The quality of data collected from different channels are inconsistent and some of them may not benefit for prediction. In real life, no…
▽ More
With the emergence of diverse data collection techniques, objects in real applications can be represented as multi-modal features. What's more, objects may have multiple semantic meanings. Multi-modal and Multi-label (MMML) problem becomes a universal phenomenon. The quality of data collected from different channels are inconsistent and some of them may not benefit for prediction. In real life, not all the modalities are needed for prediction. As a result, we propose a novel instance-oriented Multi-modal Classifier Chains (MCC) algorithm for MMML problem, which can make convince prediction with partial modalities. MCC extracts different modalities for different instances in the testing phase. Extensive experiments are performed on one real-world herbs dataset and two public datasets to validate our proposed algorithm, which reveals that it may be better to extract many instead of all of the modalities at hand.
△ Less
Submitted 27 July, 2019;
originally announced July 2019.
-
A Logarithmic Barrier Method For Proximal Policy Optimization
Authors:
Cheng Zeng,
Hongming Zhang
Abstract:
Proximal policy optimization(PPO) has been proposed as a first-order optimization method for reinforcement learning. We should notice that an exterior penalty method is used in it. Often, the minimizers of the exterior penalty functions approach feasibility only in the limits as the penalty parameter grows increasingly large. Therefore, it may result in the low level of sampling efficiency. This m…
▽ More
Proximal policy optimization(PPO) has been proposed as a first-order optimization method for reinforcement learning. We should notice that an exterior penalty method is used in it. Often, the minimizers of the exterior penalty functions approach feasibility only in the limits as the penalty parameter grows increasingly large. Therefore, it may result in the low level of sampling efficiency. This method, which we call proximal policy optimization with barrier method (PPO-B), keeps almost all advantageous spheres of PPO such as easy implementation and good generalization. Specifically, a new surrogate objective with interior penalty method is proposed to avoid the defect arose from exterior penalty method. Conclusions can be draw that PPO-B is able to outperform PPO in terms of sampling efficiency since PPO-B achieved clearly better performance on Atari and Mujoco environment than PPO.
△ Less
Submitted 16 December, 2018;
originally announced December 2018.