-
Asymptotic well-calibration of the posterior predictive $p$-value under the modified Kolmogorov-Smirnov test
Authors:
Yueming Shen
Abstract:
The posterior predictive $p$-value is a widely used tool for Bayesian model checking. However, under most test statistics, its asymptotic null distribution is more concentrated around 1/2 than uniform. Consequently, its finite-sample behavior is difficult to interpret and tends to lack power, which is a well-known issue among practitioners. A common choice of test statistic is the Kolmogorov-Smirn…
▽ More
The posterior predictive $p$-value is a widely used tool for Bayesian model checking. However, under most test statistics, its asymptotic null distribution is more concentrated around 1/2 than uniform. Consequently, its finite-sample behavior is difficult to interpret and tends to lack power, which is a well-known issue among practitioners. A common choice of test statistic is the Kolmogorov-Smirnov test with plug-in estimators. It provides a global measure of model-data discrepancy for real-valued observations and is sensitive to model misspecification. In this work, we establish that under this test statistic, the posterior predictive $p$-value converges in distribution to uniform under the null. We further use numerical experiments to demonstrate that this $p$-value is well-behaved in finite samples and can effectively detect a wide range of alternative models.
△ Less
Submitted 22 April, 2025; v1 submitted 18 April, 2025;
originally announced April 2025.
-
Covariance-Adaptive Sequential Black-box Optimization for Diffusion Targeted Generation
Authors:
Yueming Lyu,
Kim Yong Tan,
Yew Soon Ong,
Ivor W. Tsang
Abstract:
Diffusion models have demonstrated great potential in generating high-quality content for images, natural language, protein domains, etc. However, how to perform user-preferred targeted generation via diffusion models with only black-box target scores of users remains challenging. To address this issue, we first formulate the fine-tuning of the targeted reserve-time stochastic differential equatio…
▽ More
Diffusion models have demonstrated great potential in generating high-quality content for images, natural language, protein domains, etc. However, how to perform user-preferred targeted generation via diffusion models with only black-box target scores of users remains challenging. To address this issue, we first formulate the fine-tuning of the targeted reserve-time stochastic differential equation (SDE) associated with a pre-trained diffusion model as a sequential black-box optimization problem. Furthermore, we propose a novel covariance-adaptive sequential optimization algorithm to optimize cumulative black-box scores under unknown transition dynamics. Theoretically, we prove a $O(\frac{d^2}{\sqrt{T}})$ convergence rate for cumulative convex functions without smooth and strongly convex assumptions. Empirically, experiments on both numerical test problems and target-guided 3D-molecule generation tasks show the superior performance of our method in achieving better target scores.
△ Less
Submitted 8 June, 2024; v1 submitted 2 June, 2024;
originally announced June 2024.
-
Quantifying Grid Resilience Against Extreme Weather Using Large-Scale Customer Power Outage Data
Authors:
Shixiang Zhu,
Rui Yao,
Yao Xie,
Feng Qiu,
Yueming,
Qiu,
Xuan Wu
Abstract:
In recent decades, the weather around the world has become more irregular and extreme, often causing large-scale extended power outages. Resilience -- the capability of withstanding, adapting to, and recovering from a large-scale disruption -- has become a top priority for the power sector. However, the understanding of power grid resilience still stays on the conceptual level mostly or focuses on…
▽ More
In recent decades, the weather around the world has become more irregular and extreme, often causing large-scale extended power outages. Resilience -- the capability of withstanding, adapting to, and recovering from a large-scale disruption -- has become a top priority for the power sector. However, the understanding of power grid resilience still stays on the conceptual level mostly or focuses on particular components, yielding no actionable results or revealing few insights on the system level. This study provides a quantitatively measurable definition of power grid resilience, using a statistical model inspired by patterns observed from data and domain knowledge. We analyze a large-scale quarter-hourly historical electricity customer outage data and the corresponding weather records, and draw connections between the model and industry resilience practice. We showcase the resilience analysis using three major service territories on the east coast of the United States. Our analysis suggests that cumulative weather effects play a key role in causing immediate, sustained outages, and these outages can propagate and cause secondary outages in neighboring areas. The proposed model also provides some interesting insights into grid resilience enhancement planning. For example, our simulation results indicate that enhancing the power infrastructure in a small number of critical locations can reduce nearly half of the number of customer power outages in Massachusetts. In addition, we have shown that our model achieves promising accuracy in predicting the progress of customer power outages throughout extreme weather events, which can be very valuable for system operators and federal agencies to prepare disaster response.
△ Less
Submitted 4 September, 2022; v1 submitted 20 September, 2021;
originally announced September 2021.
-
Neural Optimization Kernel: Towards Robust Deep Learning
Authors:
Yueming Lyu,
Ivor Tsang
Abstract:
Deep neural networks (NN) have achieved great success in many applications. However, why do deep neural networks obtain good generalization at an over-parameterization regime is still unclear. To better understand deep NN, we establish the connection between deep NN and a novel kernel family, i.e., Neural Optimization Kernel (NOK). The architecture of structured approximation of NOK performs monot…
▽ More
Deep neural networks (NN) have achieved great success in many applications. However, why do deep neural networks obtain good generalization at an over-parameterization regime is still unclear. To better understand deep NN, we establish the connection between deep NN and a novel kernel family, i.e., Neural Optimization Kernel (NOK). The architecture of structured approximation of NOK performs monotonic descent updates of implicit regularization problems. We can implicitly choose the regularization problems by employing different activation functions, e.g., ReLU, max pooling, and soft-thresholding. We further establish a new generalization bound of our deep structured approximated NOK architecture. Our unsupervised structured approximated NOK block can serve as a simple plug-in of popular backbones for a good generalization against input noise.
△ Less
Submitted 30 November, 2021; v1 submitted 10 June, 2021;
originally announced June 2021.
-
Subgroup-based Rank-1 Lattice Quasi-Monte Carlo
Authors:
Yueming Lyu,
Yuan Yuan,
Ivor W. Tsang
Abstract:
Quasi-Monte Carlo (QMC) is an essential tool for integral approximation, Bayesian inference, and sampling for simulation in science, etc. In the QMC area, the rank-1 lattice is important due to its simple operation, and nice properties for point set construction. However, the construction of the generating vector of the rank-1 lattice is usually time-consuming because of an exhaustive computer sea…
▽ More
Quasi-Monte Carlo (QMC) is an essential tool for integral approximation, Bayesian inference, and sampling for simulation in science, etc. In the QMC area, the rank-1 lattice is important due to its simple operation, and nice properties for point set construction. However, the construction of the generating vector of the rank-1 lattice is usually time-consuming because of an exhaustive computer search. To address this issue, we propose a simple closed-form rank-1 lattice construction method based on group theory. Our method reduces the number of distinct pairwise distance values to generate a more regular lattice. We theoretically prove a lower and an upper bound of the minimum pairwise distance of any non-degenerate rank-1 lattice. Empirically, our methods can generate a near-optimal rank-1 lattice compared with the Korobov exhaustive search regarding the $l_1$-norm and $l_2$-norm minimum distance. Moreover, experimental results show that our method achieves superior approximation performance on benchmark integration test problems and kernel approximation problems.
△ Less
Submitted 28 October, 2020;
originally announced November 2020.
-
Intrinsic Reward Driven Imitation Learning via Generative Model
Authors:
Xingrui Yu,
Yueming Lyu,
Ivor W. Tsang
Abstract:
Imitation learning in a high-dimensional environment is challenging. Most inverse reinforcement learning (IRL) methods fail to outperform the demonstrator in such a high-dimensional environment, e.g., Atari domain. To address this challenge, we propose a novel reward learning module to generate intrinsic reward signals via a generative model. Our generative method can perform better forward state…
▽ More
Imitation learning in a high-dimensional environment is challenging. Most inverse reinforcement learning (IRL) methods fail to outperform the demonstrator in such a high-dimensional environment, e.g., Atari domain. To address this challenge, we propose a novel reward learning module to generate intrinsic reward signals via a generative model. Our generative method can perform better forward state transition and backward action encoding, which improves the module's dynamics modeling ability in the environment. Thus, our module provides the imitation agent both the intrinsic intention of the demonstrator and a better exploration ability, which is critical for the agent to outperform the demonstrator. Empirical results show that our method outperforms state-of-the-art IRL methods on multiple Atari games, even with one-life demonstration. Remarkably, our method achieves performance that is up to 5 times the performance of the demonstration.
△ Less
Submitted 11 September, 2020; v1 submitted 26 June, 2020;
originally announced June 2020.
-
LRTD: Long-Range Temporal Dependency based Active Learning for Surgical Workflow Recognition
Authors:
Xueying Shi,
Yueming Jin,
Qi Dou,
Pheng-Ann Heng
Abstract:
Automatic surgical workflow recognition in video is an essentially fundamental yet challenging problem for developing computer-assisted and robotic-assisted surgery. Existing approaches with deep learning have achieved remarkable performance on analysis of surgical videos, however, heavily relying on large-scale labelled datasets. Unfortunately, the annotation is not often available in abundance,…
▽ More
Automatic surgical workflow recognition in video is an essentially fundamental yet challenging problem for developing computer-assisted and robotic-assisted surgery. Existing approaches with deep learning have achieved remarkable performance on analysis of surgical videos, however, heavily relying on large-scale labelled datasets. Unfortunately, the annotation is not often available in abundance, because it requires the domain knowledge of surgeons. In this paper, we propose a novel active learning method for cost-effective surgical video analysis. Specifically, we propose a non-local recurrent convolutional network (NL-RCNet), which introduces non-local block to capture the long-range temporal dependency (LRTD) among continuous frames. We then formulate an intra-clip dependency score to represent the overall dependency within this clip. By ranking scores among clips in unlabelled data pool, we select the clips with weak dependencies to annotate, which indicates the most informative ones to better benefit network training. We validate our approach on a large surgical video dataset (Cholec80) by performing surgical workflow recognition task. By using our LRTD based selection strategy, we can outperform other state-of-the-art active learning methods. Using only up to 50% of samples, our approach can exceed the performance of full-data training.
△ Less
Submitted 23 April, 2020; v1 submitted 21 April, 2020;
originally announced April 2020.
-
Dynamic Ensemble Modeling Approach to Nonstationary Neural Decoding in Brain-Computer Interfaces
Authors:
Yu Qi,
Bin Liu,
Yueming Wang,
Gang Pan
Abstract:
Brain-computer interfaces (BCIs) have enabled prosthetic device control by decoding motor movements from neural activities. Neural signals recorded from cortex exhibit nonstationary property due to abrupt noises and neuroplastic changes in brain activities during motor control. Current state-of-the-art neural signal decoders such as Kalman filter assume fixed relationship between neural activities…
▽ More
Brain-computer interfaces (BCIs) have enabled prosthetic device control by decoding motor movements from neural activities. Neural signals recorded from cortex exhibit nonstationary property due to abrupt noises and neuroplastic changes in brain activities during motor control. Current state-of-the-art neural signal decoders such as Kalman filter assume fixed relationship between neural activities and motor movements, thus will fail if this assumption is not satisfied. We propose a dynamic ensemble modeling (DyEnsemble) approach that is capable of adapting to changes in neural signals by employing a proper combination of decoding functions. The DyEnsemble method firstly learns a set of diverse candidate models. Then, it dynamically selects and combines these models online according to Bayesian updating mechanism. Our method can mitigate the effect of noises and cope with different task behaviors by automatic model switching, thus gives more accurate predictions. Experiments with neural data demonstrate that the DyEnsemble method outperforms Kalman filters remarkably, and its advantage is more obvious with noisy signals.
△ Less
Submitted 2 November, 2019;
originally announced November 2019.
-
Black-box Optimizer with Implicit Natural Gradient
Authors:
Yueming Lyu,
Ivor W. Tsang
Abstract:
Black-box optimization is primarily important for many compute-intensive applications, including reinforcement learning (RL), robot control, etc. This paper presents a novel theoretical framework for black-box optimization, in which our method performs stochastic update with the implicit natural gradient of an exponential-family distribution. Theoretically, we prove the convergence rate of our fra…
▽ More
Black-box optimization is primarily important for many compute-intensive applications, including reinforcement learning (RL), robot control, etc. This paper presents a novel theoretical framework for black-box optimization, in which our method performs stochastic update with the implicit natural gradient of an exponential-family distribution. Theoretically, we prove the convergence rate of our framework with full matrix update for convex functions. Our theoretical results also hold for continuous non-differentiable black-box functions. Our methods are very simple and contain less hyper-parameters than CMA-ES \cite{hansen2006cma}. Empirically, our method with full matrix update achieves competitive performance compared with one of the state-of-the-art method CMA-ES on benchmark test problems. Moreover, our methods can achieve high optimization precision on some challenging test functions (e.g., $l_1$-norm ellipsoid test problem and Levy test problem), while methods with explicit natural gradient, i.e., IGO \cite{ollivier2017information} with full matrix update can not. This shows the efficiency of our methods.
△ Less
Submitted 9 September, 2020; v1 submitted 9 October, 2019;
originally announced October 2019.
-
Curriculum Loss: Robust Learning and Generalization against Label Corruption
Authors:
Yueming Lyu,
Ivor W. Tsang
Abstract:
Deep neural networks (DNNs) have great expressive power, which can even memorize samples with wrong labels. It is vitally important to reiterate robustness and generalization in DNNs against label corruption. To this end, this paper studies the 0-1 loss, which has a monotonic relationship with an empirical adversary (reweighted) risk~\citep{hu2016does}. Although the 0-1 loss has some robust proper…
▽ More
Deep neural networks (DNNs) have great expressive power, which can even memorize samples with wrong labels. It is vitally important to reiterate robustness and generalization in DNNs against label corruption. To this end, this paper studies the 0-1 loss, which has a monotonic relationship with an empirical adversary (reweighted) risk~\citep{hu2016does}. Although the 0-1 loss has some robust properties, it is difficult to optimize. To efficiently optimize the 0-1 loss while keeping its robust properties, we propose a very simple and efficient loss, i.e. curriculum loss (CL). Our CL is a tighter upper bound of the 0-1 loss compared with conventional summation based surrogate losses. Moreover, CL can adaptively select samples for model training. As a result, our loss can be deemed as a novel perspective of curriculum sample selection strategy, which bridges a connection between curriculum learning and robust learning. Experimental results on benchmark datasets validate the robustness of the proposed loss.
△ Less
Submitted 20 February, 2020; v1 submitted 24 May, 2019;
originally announced May 2019.
-
Efficient Batch Black-box Optimization with Deterministic Regret Bounds
Authors:
Yueming Lyu,
Yuan Yuan,
Ivor W. Tsang
Abstract:
In this work, we investigate black-box optimization from the perspective of frequentist kernel methods. We propose a novel batch optimization algorithm, which jointly maximizes the acquisition function and select points from a whole batch in a holistic way. Theoretically, we derive regret bounds for both the noise-free and perturbation settings irrespective of the choice of kernel. Moreover, we an…
▽ More
In this work, we investigate black-box optimization from the perspective of frequentist kernel methods. We propose a novel batch optimization algorithm, which jointly maximizes the acquisition function and select points from a whole batch in a holistic way. Theoretically, we derive regret bounds for both the noise-free and perturbation settings irrespective of the choice of kernel. Moreover, we analyze the property of the adversarial regret that is required by a robust initialization for Bayesian Optimization (BO). We prove that the adversarial regret bounds decrease with the decrease of covering radius, which provides a criterion for generating a point set to minimize the bound. We then propose fast searching algorithms to generate a point set with a small covering radius for the robust initialization. Experimental results on both synthetic benchmark problems and real-world problems show the effectiveness of the proposed algorithms.
△ Less
Submitted 27 March, 2020; v1 submitted 24 May, 2019;
originally announced May 2019.
-
Sparse Principal Component Analysis via Rotation and Truncation
Authors:
Zhenfang Hu,
Gang Pan,
Yueming Wang,
Zhaohui Wu
Abstract:
Sparse principal component analysis (sparse PCA) aims at finding a sparse basis to improve the interpretability over the dense basis of PCA, meanwhile the sparse basis should cover the data subspace as much as possible. In contrast to most of existing work which deal with the problem by adding some sparsity penalties on various objectives of PCA, in this paper, we propose a new method SPCArt, whos…
▽ More
Sparse principal component analysis (sparse PCA) aims at finding a sparse basis to improve the interpretability over the dense basis of PCA, meanwhile the sparse basis should cover the data subspace as much as possible. In contrast to most of existing work which deal with the problem by adding some sparsity penalties on various objectives of PCA, in this paper, we propose a new method SPCArt, whose motivation is to find a rotation matrix and a sparse basis such that the sparse basis approximates the basis of PCA after the rotation. The algorithm of SPCArt consists of three alternating steps: rotate PCA basis, truncate small entries, and update the rotation matrix. Its performance bounds are also given. SPCArt is efficient, with each iteration scaling linearly with the data dimension. It is easy to choose parameters in SPCArt, due to its explicit physical explanations. Besides, we give a unified view to several existing sparse PCA methods and discuss the connection with SPCArt. Some ideas in SPCArt are extended to GPower, a popular sparse PCA algorithm, to overcome its drawback. Experimental results demonstrate that SPCArt achieves the state-of-the-art performance. It also achieves a good tradeoff among various criteria, including sparsity, explained variance, orthogonality, balance of sparsity among loadings, and computational speed.
△ Less
Submitted 1 May, 2014; v1 submitted 6 March, 2014;
originally announced March 2014.