Search | arXiv e-print repository

Disentangled Interleaving Variational Encoding

Authors: Noelle Y. L. Wong, Eng Yeow Cheu, Zhonglin Chiam, Dipti Srinivasan

Abstract: Conflicting objectives present a considerable challenge in interleaving multi-task learning, necessitating the need for meticulous design and balance to ensure effective learning of a representative latent data space across all tasks without mutual negative impact. Drawing inspiration from the concept of marginal and conditional probability distributions in probability theory, we design a principl… ▽ More Conflicting objectives present a considerable challenge in interleaving multi-task learning, necessitating the need for meticulous design and balance to ensure effective learning of a representative latent data space across all tasks without mutual negative impact. Drawing inspiration from the concept of marginal and conditional probability distributions in probability theory, we design a principled and well-founded approach to disentangle the original input into marginal and conditional probability distributions in the latent space of a variational autoencoder. Our proposed model, Deep Disentangled Interleaving Variational Encoding (DeepDIVE) learns disentangled features from the original input to form clusters in the embedding space and unifies these features via the cross-attention mechanism in the fusion stage. We theoretically prove that combining the objectives for reconstruction and forecasting fully captures the lower bound and mathematically derive a loss function for disentanglement using Naïve Bayes. Under the assumption that the prior is a mixture of log-concave distributions, we also establish that the Kullback-Leibler divergence between the prior and the posterior is upper bounded by a function minimized by the minimizer of the cross entropy loss, informing our adoption of radial basis functions (RBF) and cross entropy with interleaving training for DeepDIVE to provide a justified basis for convergence. Experiments on two public datasets show that DeepDIVE disentangles the original input and yields forecast accuracies better than the original VAE and comparable to existing state-of-the-art baselines. △ Less

Submitted 16 January, 2025; v1 submitted 15 January, 2025; originally announced January 2025.

arXiv:2310.03884 [pdf, other]

Information Geometry for the Working Information Theorist

Authors: Kumar Vijay Mishra, M. Ashok Kumar, Ting-Kam Leonard Wong

Abstract: Information geometry is a study of statistical manifolds, that is, spaces of probability distributions from a geometric perspective. Its classical information-theoretic applications relate to statistical concepts such as Fisher information, sufficient statistics, and efficient estimators. Today, information geometry has emerged as an interdisciplinary field that finds applications in diverse areas… ▽ More Information geometry is a study of statistical manifolds, that is, spaces of probability distributions from a geometric perspective. Its classical information-theoretic applications relate to statistical concepts such as Fisher information, sufficient statistics, and efficient estimators. Today, information geometry has emerged as an interdisciplinary field that finds applications in diverse areas such as radar sensing, array signal processing, quantum physics, deep learning, and optimal transport. This article presents an overview of essential information geometry to initiate an information theorist, who may be unfamiliar with this exciting area of research. We explain the concepts of divergences on statistical manifolds, generalized notions of distances, orthogonality, and geodesics, thereby paving the way for concrete applications and novel theoretical investigations. We also highlight some recent information-geometric developments, which are of interest to the broader information theory community. △ Less

Submitted 5 October, 2023; originally announced October 2023.

Comments: 12 pages, 3 figures, 1 table

arXiv:2309.03097 [pdf, other]

An Algorithm for Modelling Escalator Fixed Loss Energy for PHM and sustainable energy usage

Authors: Xuwen Hu, Jiaqi Qiu, Yu Lin, Inez Maria Zwetsloot, William Ka Fai Lee, Edmond Yin San Yeung, Colman Yiu Wah Yeung, Chris Chun Long Wong

Abstract: Prognostic Health Management (PHM) is designed to assess and monitor the health status of systems, anticipate the onset of potential failure, and prevent unplanned downtime. In recent decades, collecting massive amounts of real-time sensor data enabled condition monitoring (CM) and consequently, detection of abnormalities to support maintenance decision-making. Additionally, the utilization of PHM… ▽ More Prognostic Health Management (PHM) is designed to assess and monitor the health status of systems, anticipate the onset of potential failure, and prevent unplanned downtime. In recent decades, collecting massive amounts of real-time sensor data enabled condition monitoring (CM) and consequently, detection of abnormalities to support maintenance decision-making. Additionally, the utilization of PHM techniques can support energy sustainability efforts by optimizing energy usage and identifying opportunities for energy-saving measures. Escalators are efficient machines for transporting people and goods, and measuring energy consumption in time can facilitate PHM of escalators. Fixed loss energy, or no-load energy, of escalators denotes the energy consumption by an unloaded escalator. Fixed loss energy varies over time indicating varying operating conditions. In this paper, we propose to use escalators' fixed loss energy for PHM. We propose an approach to compute daily fixed loss energy based on energy consumption sensor data. The proposed approach is validated using a set of experimental data. The advantages and disadvantages of each approach are also presented, and recommendations are given. Finally, to illustrate PHM, we set up an EWMA chart for monitoring the fixed loss over time and demonstrate the potential in reducing energy costs associated with escalator operation. △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2306.05436 [pdf, other]

Remaining Useful Life Modelling with an Escalator Health Condition Analytic System

Authors: Inez M. Zwetsloot, Yu Lin, Jiaqi Qiu, Lishuai Li, William Ka Fai Lee, Edmond Yin San Yeung, Colman Yiu Wah Yeung, Chris Chun Long Wong

Abstract: The refurbishment of an escalator is usually linked with its design life as recommended by the manufacturer. However, the actual useful life of an escalator should be determined by its operating condition which is affected by the runtime, workload, maintenance quality, vibration, etc., rather than age only. The objective of this project is to develop a comprehensive health condition analytic syste… ▽ More The refurbishment of an escalator is usually linked with its design life as recommended by the manufacturer. However, the actual useful life of an escalator should be determined by its operating condition which is affected by the runtime, workload, maintenance quality, vibration, etc., rather than age only. The objective of this project is to develop a comprehensive health condition analytic system for escalators to support refurbishment decisions. The analytic system consists of four parts: 1) online data gathering and processing; 2) a dashboard for condition monitoring; 3) a health index model; and 4) remaining useful life prediction. The results can be used for a) predicting the remaining useful life of the escalators, in order to support asset replacement planning and b) monitoring the real-time condition of escalators; including alerts when vibration exceeds the threshold and signal diagnosis, giving an indication of possible root cause (components) of the alert signal. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: 14 pages, 12 figures, 7 tables

arXiv:2212.13558 [pdf, other]

AER: Auto-Encoder with Regression for Time Series Anomaly Detection

Authors: Lawrence Wong, Dongyu Liu, Laure Berti-Equille, Sarah Alnegheimish, Kalyan Veeramachaneni

Abstract: Anomaly detection on time series data is increasingly common across various industrial domains that monitor metrics in order to prevent potential accidents and economic losses. However, a scarcity of labeled data and ambiguous definitions of anomalies can complicate these efforts. Recent unsupervised machine learning methods have made remarkable progress in tackling this problem using either singl… ▽ More Anomaly detection on time series data is increasingly common across various industrial domains that monitor metrics in order to prevent potential accidents and economic losses. However, a scarcity of labeled data and ambiguous definitions of anomalies can complicate these efforts. Recent unsupervised machine learning methods have made remarkable progress in tackling this problem using either single-timestamp predictions or time series reconstructions. While traditionally considered separately, these methods are not mutually exclusive and can offer complementary perspectives on anomaly detection. This paper first highlights the successes and limitations of prediction-based and reconstruction-based methods with visualized time series signals and anomaly scores. We then propose AER (Auto-encoder with Regression), a joint model that combines a vanilla auto-encoder and an LSTM regressor to incorporate the successes and address the limitations of each method. Our model can produce bi-directional predictions while simultaneously reconstructing the original time series by optimizing a joint objective function. Furthermore, we propose several ways of combining the prediction and reconstruction errors through a series of ablation studies. Finally, we compare the performance of the AER architecture against two prediction-based methods and three reconstruction-based methods on 12 well-known univariate time series datasets from NASA, Yahoo, Numenta, and UCR. The results show that AER has the highest averaged F1 score across all datasets (a 23.5% improvement compared to ARIMA) while retaining a runtime similar to its vanilla auto-encoder and regressor components. Our model is available in Orion, an open-source benchmarking tool for time series anomaly detection. △ Less

Submitted 27 December, 2022; originally announced December 2022.

Comments: This work is accepted by IEEE BigData 2022. The paper contains 10 pages, 6 figures, and 4 tables

arXiv:2211.02990 [pdf, other]

Efficient convex PCA with applications to Wasserstein GPCA and ranked data

Authors: Steven Campbell, Ting-Kam Leonard Wong

Abstract: Convex PCA, which was introduced in Bigot et al. (2017), modifies Euclidean PCA by restricting the data and the principal components to lie in a given convex subset of a Hilbert space. This setting arises naturally in many applications, including distributional data in the Wasserstein space of an interval, and ranked compositional data under the Aitchison geometry. Our contribution in this paper i… ▽ More Convex PCA, which was introduced in Bigot et al. (2017), modifies Euclidean PCA by restricting the data and the principal components to lie in a given convex subset of a Hilbert space. This setting arises naturally in many applications, including distributional data in the Wasserstein space of an interval, and ranked compositional data under the Aitchison geometry. Our contribution in this paper is threefold. First, we present several new theoretical results including consistency as well as continuity and differentiability of the objective function in the finite dimensional case. Second, we develop a numerical implementation of finite dimensional convex PCA when the convex set is polyhedral, and show that this provides a natural approximation of Wasserstein GPCA. Third, we illustrate our results with two financial applications, namely distributions of stock returns ranked by size and the capital distribution curve, both of which are of independent interest in stochastic portfolio theory. Supplementary materials for this article are available online. △ Less

Submitted 30 August, 2024; v1 submitted 5 November, 2022; originally announced November 2022.

Comments: 43 pages, 10 figures, 4 tables, Code available at: https://github.com/stevenacampbell/ConvexPCA

arXiv:2105.07767 [pdf, other]

Projections with logarithmic divergences

Authors: Zhixu Tao, Ting-Kam Leonard Wong

Abstract: In information geometry, generalized exponential families and statistical manifolds with curvature are under active investigation in recent years. In this paper we consider the statistical manifold induced by a logarithmic $L^{(α)}$-divergence which generalizes the Bregman divergence. It is known that such a manifold is dually projectively flat with constant negative sectional curvature, and is cl… ▽ More In information geometry, generalized exponential families and statistical manifolds with curvature are under active investigation in recent years. In this paper we consider the statistical manifold induced by a logarithmic $L^{(α)}$-divergence which generalizes the Bregman divergence. It is known that such a manifold is dually projectively flat with constant negative sectional curvature, and is closely related to the $\mathcal{F}^{(α)}$-family, a generalized exponential family introduced by the second author. Our main result constructs a dual foliation of the statistical manifold, i.e., an orthogonal decomposition consisting of primal and dual autoparallel submanifolds. This decomposition, which can be naturally interpreted in terms of primal and dual projections with respect to the logarithmic divergence, extends the dual foliation of a dually flat manifold studied by Amari. As an application, we formulate a new $L^{(α)}$-PCA problem which generalizes the exponential family PCA. △ Less

Submitted 8 May, 2021; originally announced May 2021.

Comments: 9 pages, 2 figures. To appear in GSI2021

arXiv:2003.04300 [pdf, other]

Learning Discrete State Abstractions With Deep Variational Inference

Authors: Ondrej Biza, Robert Platt, Jan-Willem van de Meent, Lawson L. S. Wong

Abstract: Abstraction is crucial for effective sequential decision making in domains with large state spaces. In this work, we propose an information bottleneck method for learning approximate bisimulations, a type of state abstraction. We use a deep neural encoder to map states onto continuous embeddings. We map these embeddings onto a discrete representation using an action-conditioned hidden Markov model… ▽ More Abstraction is crucial for effective sequential decision making in domains with large state spaces. In this work, we propose an information bottleneck method for learning approximate bisimulations, a type of state abstraction. We use a deep neural encoder to map states onto continuous embeddings. We map these embeddings onto a discrete representation using an action-conditioned hidden Markov model, which is trained end-to-end with the neural network. Our method is suited for environments with high-dimensional states and learns from a stream of experience collected by an agent acting in a Markov decision process. Through this learned discrete abstract model, we can efficiently plan for unseen goals in a multi-goal Reinforcement Learning setting. We test our method in simplified robotic manipulation domains with image states. We also compare it against previous model-based approaches to finding bisimulations in discrete grid-world-like environments. Source code is available at https://github.com/ondrejba/discrete_abstractions. △ Less

Submitted 11 January, 2021; v1 submitted 9 March, 2020; originally announced March 2020.

Comments: 15 pages, 7 figures

arXiv:2001.01328 [pdf, other]

Scalable Gradients for Stochastic Differential Equations

Authors: Xuechen Li, Ting-Kam Leonard Wong, Ricky T. Q. Chen, David Duvenaud

Abstract: The adjoint sensitivity method scalably computes gradients of solutions to ordinary differential equations. We generalize this method to stochastic differential equations, allowing time-efficient and constant-memory computation of gradients with high-order adaptive solvers. Specifically, we derive a stochastic differential equation whose solution is the gradient, a memory-efficient algorithm for c… ▽ More The adjoint sensitivity method scalably computes gradients of solutions to ordinary differential equations. We generalize this method to stochastic differential equations, allowing time-efficient and constant-memory computation of gradients with high-order adaptive solvers. Specifically, we derive a stochastic differential equation whose solution is the gradient, a memory-efficient algorithm for caching noise, and conditions under which numerical solutions converge. In addition, we combine our method with gradient-based stochastic variational inference for latent stochastic differential equations. We use our method to fit stochastic dynamics defined by neural networks, achieving competitive performance on a 50-dimensional motion capture dataset. △ Less

Submitted 18 October, 2020; v1 submitted 5 January, 2020; originally announced January 2020.

Comments: AISTATS 2020; 25 pages, 6 figures in main text; clarify notation in appendix

arXiv:1910.00762 [pdf, other]

Accelerating Deep Learning by Focusing on the Biggest Losers

Authors: Angela H. Jiang, Daniel L. -K. Wong, Giulio Zhou, David G. Andersen, Jeffrey Dean, Gregory R. Ganger, Gauri Joshi, Michael Kaminksy, Michael Kozuch, Zachary C. Lipton, Padmanabhan Pillai

Abstract: This paper introduces Selective-Backprop, a technique that accelerates the training of deep neural networks (DNNs) by prioritizing examples with high loss at each iteration. Selective-Backprop uses the output of a training example's forward pass to decide whether to use that example to compute gradients and update parameters, or to skip immediately to the next example. By reducing the number of co… ▽ More This paper introduces Selective-Backprop, a technique that accelerates the training of deep neural networks (DNNs) by prioritizing examples with high loss at each iteration. Selective-Backprop uses the output of a training example's forward pass to decide whether to use that example to compute gradients and update parameters, or to skip immediately to the next example. By reducing the number of computationally-expensive backpropagation steps performed, Selective-Backprop accelerates training. Evaluation on CIFAR10, CIFAR100, and SVHN, across a variety of modern image models, shows that Selective-Backprop converges to target error rates up to 3.5x faster than with standard SGD and between 1.02--1.8x faster than a state-of-the-art importance sampling approach. Further acceleration of 26% can be achieved by using stale forward pass results for selection, thus also skipping forward passes of low priority examples. △ Less

Submitted 1 October, 2019; originally announced October 2019.

arXiv:1810.10664 [pdf, other]

Automated Process Incorporating Machine Learning Segmentation and Correlation of Oral Diseases with Systemic Health

Authors: Gregory Yauney, Aman Rana, Lawrence C. Wong, Perikumar Javia, Ali Muftu, Pratik Shah

Abstract: Imaging fluorescent disease biomarkers in tissues and skin is a non-invasive method to screen for health conditions. We report an automated process that combines intraoral fluorescent porphyrin biomarker imaging, clinical examinations and machine learning for correlation of systemic health conditions with periodontal disease. 1215 intraoral fluorescent images, from 284 consenting adults aged 18-90… ▽ More Imaging fluorescent disease biomarkers in tissues and skin is a non-invasive method to screen for health conditions. We report an automated process that combines intraoral fluorescent porphyrin biomarker imaging, clinical examinations and machine learning for correlation of systemic health conditions with periodontal disease. 1215 intraoral fluorescent images, from 284 consenting adults aged 18-90, were analyzed using a machine learning classifier that can segment periodontal inflammation. The classifier achieved an AUC of 0.677 with precision and recall of 0.271 and 0.429, respectively, indicating a learned association between disease signatures in collected images. Periodontal diseases were more prevalent among males (p=0.0012) and older subjects (p=0.0224) in the screened population. Physicians independently examined the collected images, assigning localized modified gingival indices (MGIs). MGIs and periodontal disease were then cross-correlated with responses to a medical history questionnaire, blood pressure and body mass index measurements, and optic nerve, tympanic membrane, neurological, and cardiac rhythm imaging examinations. Gingivitis and early periodontal disease were associated with subjects diagnosed with optic nerve abnormalities (p <0.0001) in their retinal scans. We also report significant co-occurrences of periodontal disease in subjects reporting swollen joints (p=0.0422) and a family history of eye disease (p=0.0337). These results indicate cross-correlation of poor periodontal health with systemic health outcomes and stress the importance of oral health screenings at the primary care level. Our screening process and analysis method, using images and machine learning, can be generalized for automated diagnoses and systemic health screenings for other diseases. △ Less

Submitted 24 October, 2018; originally announced October 2018.

Comments: Submitted to IEEE Journal of Biomedical and Health Informatics, 2018

arXiv:1611.00800 [pdf]

Temporal Matrix Completion with Locally Linear Latent Factors for Medical Applications

Authors: Frodo Kin Sun Chan, Andy J Ma, Pong C Yuen, Terry Cheuk-Fung Yip, Yee-Kit Tse, Vincent Wai-Sun Wong, Grace Lai-Hung Wong

Abstract: Regular medical records are useful for medical practitioners to analyze and monitor patient health status especially for those with chronic disease, but such records are usually incomplete due to unpunctuality and absence of patients. In order to resolve the missing data problem over time, tensor-based model is suggested for missing data imputation in recent papers because this approach makes use… ▽ More Regular medical records are useful for medical practitioners to analyze and monitor patient health status especially for those with chronic disease, but such records are usually incomplete due to unpunctuality and absence of patients. In order to resolve the missing data problem over time, tensor-based model is suggested for missing data imputation in recent papers because this approach makes use of low rank tensor assumption for highly correlated data. However, when the time intervals between records are long, the data correlation is not high along temporal direction and such assumption is not valid. To address this problem, we propose to decompose a matrix with missing data into its latent factors. Then, the locally linear constraint is imposed on these factors for matrix completion in this paper. By using a publicly available dataset and two medical datasets collected from hospital, experimental results show that the proposed algorithm achieves the best performance by comparing with the existing methods. △ Less

Submitted 31 October, 2016; originally announced November 2016.

arXiv:1111.5487 [pdf, ps, other]

doi 10.1214/11-AOAS465

Generalized genetic association study with samples of related individuals

Authors: Zeny Feng, William W. L. Wong, Xin Gao, Flavio Schenkel

Abstract: Genetic association study is an essential step to discover genetic factors that are associated with a complex trait of interest. In this paper we present a novel generalized quasi-likelihood score (GQLS) test that is suitable for a study with either a quantitative trait or a binary trait. We use a logistic regression model to link the phenotypic value of the trait to the distribution of allelic fr… ▽ More Genetic association study is an essential step to discover genetic factors that are associated with a complex trait of interest. In this paper we present a novel generalized quasi-likelihood score (GQLS) test that is suitable for a study with either a quantitative trait or a binary trait. We use a logistic regression model to link the phenotypic value of the trait to the distribution of allelic frequencies. In our model, the allele frequencies are treated as a response and the trait is treated as a covariate that allows us to leave the distribution of the trait values unspecified. Simulation studies indicate that our method is generally more powerful in comparison with the family-based association test (FBAT) and controls the type I error at the desired levels. We apply our method to analyze data on Holstein cattle for an estimated breeding value phenotype, and to analyze data from the Collaborative Study of the Genetics of Alcoholism for alcohol dependence. The results show a good portion of significant SNPs and regions consistent with previous reports in the literature, and also reveal new significant SNPs and regions that are associated with the complex trait of interest. △ Less

Submitted 23 November, 2011; originally announced November 2011.

Comments: Published in at http://dx.doi.org/10.1214/11-AOAS465 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS465

Journal ref: Annals of Applied Statistics 2011, Vol. 5, No. 3, 2109-2130

Showing 1–13 of 13 results for author: Wong, L