Search | arXiv e-print repository

Estimation of Treatment Effects based on Kernel Matching

Authors: Chong Ding, Zheng Li, Hon Keung Tony Ng, Wei Gao

Abstract: The treatment effect represents the average causal impact or outcome difference between treatment and control groups. Treatment effects can be estimated through social experiments, regression models, matching estimators, and instrumental variables. In this paper, we introduce a novel kernel-matching estimator for treatment effect estimation. This method is particularly beneficial in observat… ▽ More The treatment effect represents the average causal impact or outcome difference between treatment and control groups. Treatment effects can be estimated through social experiments, regression models, matching estimators, and instrumental variables. In this paper, we introduce a novel kernel-matching estimator for treatment effect estimation. This method is particularly beneficial in observational studies where randomized control trials are not feasible, as it uses the full sample to increase the efficiency and robustness of treatment effect estimates. We demonstrate that the proposed estimator is consistent and asymptotically efficient under certain conditions. Through Monte Carlo simulations, we show that the estimator performs favorably against other estimators in the literature. Finally, we apply our method to data from the National Supported Work Demonstration to illustrate its practical application. △ Less

Submitted 15 February, 2025; originally announced February 2025.

arXiv:2410.17604 [pdf, other]

Ranking of Multi-Response Experiment Treatments

Authors: Miguel R. Pebes-Trujillo, Itamar Shenhar, Aravind Harikumar, Ittai Herrmann, Menachem Moshelion, Kee Woei Ng, Matan Gavish

Abstract: We present a probabilistic ranking model to identify the optimal treatment in multiple-response experiments. In contemporary practice, treatments are applied over individuals with the goal of achieving multiple ideal properties on them simultaneously. However, often there are competing properties, and the optimality of one cannot be achieved without compromising the optimality of another. Typicall… ▽ More We present a probabilistic ranking model to identify the optimal treatment in multiple-response experiments. In contemporary practice, treatments are applied over individuals with the goal of achieving multiple ideal properties on them simultaneously. However, often there are competing properties, and the optimality of one cannot be achieved without compromising the optimality of another. Typically, we still want to know which treatment is the overall best. In our framework, we first formulate overall optimality in terms of treatment ranks. Then we infer the latent ranking that allow us to report treatments from optimal to least optimal, provided ideal desirable properties. We demonstrate through simulations and real data analysis how we can achieve reliability of inferred ranks in practice. We adopt a Bayesian approach and derive an associated Markov Chain Monte Carlo algorithm to fit our model to data. Finally, we discuss the prospects of adoption of our method as a standard tool for experiment evaluation in trials-based research. △ Less

Submitted 23 October, 2024; originally announced October 2024.

MSC Class: 68T05; 62H10; 62H12; 62H30 ACM Class: I.2; I.5; G.3

arXiv:2410.05757 [pdf, ps, other]

Temperature Optimization for Bayesian Deep Learning

Authors: Kenyon Ng, Chris van der Heide, Liam Hodgkinson, Susan Wei

Abstract: The Cold Posterior Effect (CPE) is a phenomenon in Bayesian Deep Learning (BDL), where tempering the posterior to a cold temperature often improves the predictive performance of the posterior predictive distribution (PPD). Although the term `CPE' suggests colder temperatures are inherently better, the BDL community increasingly recognizes that this is not always the case. Despite this, there remai… ▽ More The Cold Posterior Effect (CPE) is a phenomenon in Bayesian Deep Learning (BDL), where tempering the posterior to a cold temperature often improves the predictive performance of the posterior predictive distribution (PPD). Although the term `CPE' suggests colder temperatures are inherently better, the BDL community increasingly recognizes that this is not always the case. Despite this, there remains no systematic method for finding the optimal temperature beyond grid search. In this work, we propose a data-driven approach to select the temperature that maximizes test log-predictive density, treating the temperature as a model parameter and estimating it directly from the data. We empirically demonstrate that our method performs comparably to grid search, at a fraction of the cost, across both regression and classification tasks. Finally, we highlight the differing perspectives on CPE between the BDL and Generalized Bayes communities: while the former primarily emphasizes the predictive performance of the PPD, the latter prioritizes the utility of the posterior under model misspecification; these distinct objectives lead to different temperature preferences. △ Less

Submitted 11 June, 2025; v1 submitted 8 October, 2024; originally announced October 2024.

Comments: 11 pages (+5 reference, +17 appendix). Accepted at UAI 2025

arXiv:2410.05753 [pdf, other]

Pathwise Gradient Variance Reduction with Control Variates in Variational Inference

Authors: Kenyon Ng, Susan Wei

Abstract: Variational inference in Bayesian deep learning often involves computing the gradient of an expectation that lacks a closed-form solution. In these cases, pathwise and score-function gradient estimators are the most common approaches. The pathwise estimator is often favoured for its substantially lower variance compared to the score-function estimator, which typically requires variance reduction t… ▽ More Variational inference in Bayesian deep learning often involves computing the gradient of an expectation that lacks a closed-form solution. In these cases, pathwise and score-function gradient estimators are the most common approaches. The pathwise estimator is often favoured for its substantially lower variance compared to the score-function estimator, which typically requires variance reduction techniques. However, recent research suggests that even pathwise gradient estimators could benefit from variance reduction. In this work, we review existing control-variates-based variance reduction methods for pathwise gradient estimators to assess their effectiveness. Notably, these methods often rely on integrand approximations and are applicable only to simple variational families. To address this limitation, we propose applying zero-variance control variates to pathwise gradient estimators. This approach offers the advantage of requiring minimal assumptions about the variational distribution, other than being able to sample from it. △ Less

Submitted 8 October, 2024; originally announced October 2024.

Comments: 9 (+16 appendix) pages

arXiv:2409.02372 [pdf, other]

A Principal Square Response Forward Regression Method for Dimension Reduction

Authors: Zheng Li, Yunhao Wang, Wei Gao, Hon Keung Tony Ng

Abstract: Dimension reduction techniques, such as Sufficient Dimension Reduction (SDR), are indispensable for analyzing high-dimensional datasets. This paper introduces a novel SDR method named Principal Square Response Forward Regression (PSRFR) for estimating the central subspace of the response variable Y, given the vector of predictor variables $\bm{X}$. We provide a computational algorithm for implemen… ▽ More Dimension reduction techniques, such as Sufficient Dimension Reduction (SDR), are indispensable for analyzing high-dimensional datasets. This paper introduces a novel SDR method named Principal Square Response Forward Regression (PSRFR) for estimating the central subspace of the response variable Y, given the vector of predictor variables $\bm{X}$. We provide a computational algorithm for implementing PSRFR and establish its consistency and asymptotic properties. Monte Carlo simulations are conducted to assess the performance, efficiency, and robustness of the proposed method. Notably, PSRFR exhibits commendable performance in scenarios where the variance of each component becomes increasingly dissimilar, particularly when the predictor variables follow an elliptical distribution. Furthermore, we illustrate and validate the effectiveness of PSRFR using a real-world dataset concerning wine quality. Our findings underscore the utility and reliability of the PSRFR method in practical applications of dimension reduction for high-dimensional data analysis. △ Less

Submitted 3 September, 2024; originally announced September 2024.

arXiv:2408.09054 [pdf, other]

From Urban Clusters to Megaregions: Mapping Australia's Evolving Urban Regions

Authors: M. K. M Ng, Z. Shabrina, S. Sarkar, H. Han, C. Pettit

Abstract: This study employs percolation theory to investigate the hierarchical organisation of Australian urban centres through the connectivity of their road networks. The analysis demonstrates how discrete urban clusters have developed into integrated regional entities, delineating the pivotal distance thresholds that regulate these urban transitions. The study reveals the interconnections between dispar… ▽ More This study employs percolation theory to investigate the hierarchical organisation of Australian urban centres through the connectivity of their road networks. The analysis demonstrates how discrete urban clusters have developed into integrated regional entities, delineating the pivotal distance thresholds that regulate these urban transitions. The study reveals the interconnections between disparate urban clusters, shaped by their functional differentiation and historical development. Furthermore, the study identifies a dichotomy of urban agglomeration forces and a persistent spatial disconnection between Australia's wider urban landscape. This highlights the interplay between urban densification and peripheral growth. It suggests the need for new thinking on potential integrated governance structures that bridge urban development with broader social and economic policies across regional and national scales. Additionally, the study emphasises the growing importance of national coordination in Australian urban development planning to ensure regional consistency, equity, and productivity. △ Less

Submitted 16 August, 2024; originally announced August 2024.

arXiv:2405.17464 [pdf, other]

Data Valuation by Leveraging Global and Local Statistical Information

Authors: Xiaoling Zhou, Ou Wu, Michael K. Ng, Hao Jiang

Abstract: Data valuation has garnered increasing attention in recent years, given the critical role of high-quality data in various applications, particularly in machine learning tasks. There are diverse technical avenues to quantify the value of data within a corpus. While Shapley value-based methods are among the most widely used techniques in the literature due to their solid theoretical foundation, the… ▽ More Data valuation has garnered increasing attention in recent years, given the critical role of high-quality data in various applications, particularly in machine learning tasks. There are diverse technical avenues to quantify the value of data within a corpus. While Shapley value-based methods are among the most widely used techniques in the literature due to their solid theoretical foundation, the accurate calculation of Shapley values is often intractable, leading to the proposal of numerous approximated calculation methods. Despite significant progress, nearly all existing methods overlook the utilization of distribution information of values within a data corpus. In this paper, we demonstrate that both global and local statistical information of value distributions hold significant potential for data valuation within the context of machine learning. Firstly, we explore the characteristics of both global and local value distributions across several simulated and real data corpora. Useful observations and clues are obtained. Secondly, we propose a new data valuation method that estimates Shapley values by incorporating the explored distribution characteristics into an existing method, AME. Thirdly, we present a new path to address the dynamic data valuation problem by formulating an optimization problem that integrates information of both global and local value distributions. Extensive experiments are conducted on Shapley value estimation, value-based data removal/adding, mislabeled data detection, and incremental/decremental data valuation. The results showcase the effectiveness and efficiency of our proposed methodologies, affirming the significant potential of global and local value distributions in data valuation. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 12 pages, 8 figures. arXiv admin note: text overlap with arXiv:2306.10577 by other authors

ACM Class: I.2

arXiv:2401.11263 [pdf, other]

Estimating Heterogeneous Treatment Effects on Survival Outcomes Using Counterfactual Censoring Unbiased Transformations

Authors: Shenbo Xu, Raluca Cobzaru, Stan N. Finkelstein, Roy E. Welsch, Kenney Ng, Zach Shahn

Abstract: Methods for estimating heterogeneous treatment effects (HTE) from observational data have largely focused on continuous or binary outcomes, with less attention paid to survival outcomes and almost none to settings with competing risks. In this work, we develop censoring unbiased transformations (CUTs) for survival outcomes both with and without competing risks. After converting time-to-event outco… ▽ More Methods for estimating heterogeneous treatment effects (HTE) from observational data have largely focused on continuous or binary outcomes, with less attention paid to survival outcomes and almost none to settings with competing risks. In this work, we develop censoring unbiased transformations (CUTs) for survival outcomes both with and without competing risks. After converting time-to-event outcomes using these CUTs, direct application of HTE learners for continuous outcomes yields consistent estimates of heterogeneous cumulative incidence effects, total effects, and separable direct effects. Our CUTs enable application of a much larger set of state of the art HTE learners for censored outcomes than had previously been available, especially in competing risks settings. We provide generic model-free learner-specific oracle inequalities bounding the finite-sample excess risk. The oracle efficiency results depend on the oracle selector and estimated nuisance functions from all steps involved in the transformation. We demonstrate the empirical performance of the proposed methods in simulation studies. △ Less

Submitted 27 September, 2024; v1 submitted 20 January, 2024; originally announced January 2024.

arXiv:2310.03758 [pdf, other]

A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing

Authors: Junren Chen, Jonathan Scarlett, Michael K. Ng, Zhaoqiang Liu

Abstract: In generative compressed sensing (GCS), we want to recover a signal $\mathbf{x}^* \in \mathbb{R}^n$ from $m$ measurements ($m\ll n$) using a generative prior $\mathbf{x}^*\in G(\mathbb{B}_2^k(r))$, where $G$ is typically an $L$-Lipschitz continuous generative model and $\mathbb{B}_2^k(r)$ represents the radius-$r$ $\ell_2$-ball in $\mathbb{R}^k$. Under nonlinear measurements, most prior results ar… ▽ More In generative compressed sensing (GCS), we want to recover a signal $\mathbf{x}^* \in \mathbb{R}^n$ from $m$ measurements ($m\ll n$) using a generative prior $\mathbf{x}^*\in G(\mathbb{B}_2^k(r))$, where $G$ is typically an $L$-Lipschitz continuous generative model and $\mathbb{B}_2^k(r)$ represents the radius-$r$ $\ell_2$-ball in $\mathbb{R}^k$. Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $\mathbf{x}^*$ rather than for all $\mathbf{x}^*$ simultaneously. In this paper, we build a unified framework to derive uniform recovery guarantees for nonlinear GCS where the observation model is nonlinear and possibly discontinuous or unknown. Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples. Specifically, using a single realization of the sensing ensemble and generalized Lasso, {\em all} $\mathbf{x}^*\in G(\mathbb{B}_2^k(r))$ can be recovered up to an $\ell_2$-error at most $ε$ using roughly $\tilde{O}({k}/{ε^2})$ samples, with omitted logarithmic factors typically being dominated by $\log L$. Notably, this almost coincides with existing non-uniform guarantees up to logarithmic factors, hence the uniformity costs very little. As part of our technical contributions, we introduce the Lipschitz approximation to handle discontinuous observation models. We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy. Experimental results are presented to corroborate our theory. △ Less

Submitted 9 October, 2023; v1 submitted 25 September, 2023; originally announced October 2023.

Comments: Accepted to NeurIPS 2023

arXiv:2309.09032 [pdf, other]

Solving Quadratic Systems with Full-Rank Matrices Using Sparse or Generative Priors

Authors: Junren Chen, Michael K. Ng, Zhaoqiang Liu

Abstract: The problem of recovering a signal $\boldsymbol x\in \mathbb{R}^n$ from a quadratic system $\{y_i=\boldsymbol x^\top\boldsymbol A_i\boldsymbol x,\ i=1,\ldots,m\}$ with full-rank matrices $\boldsymbol A_i$ frequently arises in applications such as unassigned distance geometry and sub-wavelength imaging. With i.i.d. standard Gaussian matrices $\boldsymbol A_i$, this paper addresses the high-dimensio… ▽ More The problem of recovering a signal $\boldsymbol x\in \mathbb{R}^n$ from a quadratic system $\{y_i=\boldsymbol x^\top\boldsymbol A_i\boldsymbol x,\ i=1,\ldots,m\}$ with full-rank matrices $\boldsymbol A_i$ frequently arises in applications such as unassigned distance geometry and sub-wavelength imaging. With i.i.d. standard Gaussian matrices $\boldsymbol A_i$, this paper addresses the high-dimensional case where $m\ll n$ by incorporating prior knowledge of $\boldsymbol x$. First, we consider a $k$-sparse $\boldsymbol x$ and introduce the thresholded Wirtinger flow (TWF) algorithm that does not require the sparsity level $k$. TWF comprises two steps: the spectral initialization that identifies a point sufficiently close to $\boldsymbol x$ (up to a sign flip) when $m=O(k^2\log n)$, and the thresholded gradient descent which, when provided a good initialization, produces a sequence linearly converging to $\boldsymbol x$ with $m=O(k\log n)$ measurements. Second, we explore the generative prior, assuming that $x$ lies in the range of an $L$-Lipschitz continuous generative model with $k$-dimensional inputs in an $\ell_2$-ball of radius $r$. With an estimate correlated with the signal, we develop the projected gradient descent (PGD) algorithm that also comprises two steps: the projected power method that provides an initial vector with $O\big(\sqrt{\frac{k \log L}{m}}\big)$ $\ell_2$-error given $m=O(k\log(Lnr))$ measurements, and the projected gradient descent that refines the $\ell_2$-error to $O(δ)$ at a geometric rate when $m=O(k\log\frac{Lrn}{δ^2})$. Experimental results corroborate our theoretical findings and show that: (i) our approach for the sparse case notably outperforms the existing provable algorithm sparse power factorization; (ii) leveraging the generative prior allows for precise image recovery in the MNIST dataset from a small number of quadratic measurements. △ Less

Submitted 29 October, 2024; v1 submitted 16 September, 2023; originally announced September 2023.

arXiv:2308.16059 [pdf, other]

A Parameter-Free Two-Bit Covariance Estimator with Improved Operator Norm Error Rate

Authors: Junren Chen, Michael K. Ng

Abstract: A covariance matrix estimator using two bits per entry was recently developed by Dirksen, Maly and Rauhut [Annals of Statistics, 50(6), pp. 3538-3562]. The estimator achieves near minimax rate for general sub-Gaussian distributions, but also suffers from two downsides: theoretically, there is an essential gap on operator norm error between their estimator and sample covariance when the diagonal of… ▽ More A covariance matrix estimator using two bits per entry was recently developed by Dirksen, Maly and Rauhut [Annals of Statistics, 50(6), pp. 3538-3562]. The estimator achieves near minimax rate for general sub-Gaussian distributions, but also suffers from two downsides: theoretically, there is an essential gap on operator norm error between their estimator and sample covariance when the diagonal of the covariance matrix is dominated by only a few entries; practically, its performance heavily relies on the dithering scale, which needs to be tuned according to some unknown parameters. In this work, we propose a new 2-bit covariance matrix estimator that simultaneously addresses both issues. Unlike the sign quantizer associated with uniform dither in Dirksen et al., we adopt a triangular dither prior to a 2-bit quantizer inspired by the multi-bit uniform quantizer. By employing dithering scales varying across entries, our estimator enjoys an improved operator norm error rate that depends on the effective rank of the underlying covariance matrix rather than the ambient dimension, thus closing the theoretical gap. Moreover, our proposed method eliminates the need of any tuning parameter, as the dithering scales are entirely determined by the data. Experimental results under Gaussian samples are provided to showcase the impressive numerical performance of our estimator. Remarkably, by halving the dithering scales, our estimator oftentimes achieves operator norm errors less than twice of the errors of sample covariance. △ Less

Submitted 10 November, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: Major changes. In particular, we further adapt our method to online settings (see Section 5)

arXiv:2305.02373 [pdf, other]

Efficient estimation of weighted cumulative treatment effects by double/debiased machine learning

Authors: Shenbo Xu, Bang Zheng, Bowen Su, Stan Finkelstein, Roy Welsch, Kenney Ng, Ioanna Tzoulaki, Zach Shahn

Abstract: In empirical studies with time-to-event outcomes, investigators often leverage observational data to conduct causal inference on the effect of exposure when randomized controlled trial data is unavailable. Model misspecification and lack of overlap are common issues in observational studies, and they often lead to inconsistent and inefficient estimators of the average treatment effect. Estimators… ▽ More In empirical studies with time-to-event outcomes, investigators often leverage observational data to conduct causal inference on the effect of exposure when randomized controlled trial data is unavailable. Model misspecification and lack of overlap are common issues in observational studies, and they often lead to inconsistent and inefficient estimators of the average treatment effect. Estimators targeting overlap weighted effects have been proposed to address the challenge of poor overlap, and methods enabling flexible machine learning for nuisance models address model misspecification. However, the approaches that allow machine learning for nuisance models have not been extended to the setting of weighted average treatment effects for time-to-event outcomes when there is poor overlap. In this work, we propose a class of one-step cross-fitted double/debiased machine learning estimators for the weighted cumulative causal effect as a function of restriction time. We prove that the proposed estimators are consistent, asymptotically linear, and reach semiparametric efficiency bounds under regularity conditions. Our simulations show that the proposed estimators using nonparametric machine learning nuisance models perform as well as established methods that require correctly-specified parametric nuisance models, illustrating that our estimators mitigate the need for oracle parametric nuisance models. We apply the proposed methods to real-world observational data from a UK primary care database to compare the effects of anti-diabetic drugs on cancer clinical outcomes. △ Less

Submitted 3 May, 2023; originally announced May 2023.

arXiv:2302.11197 [pdf, ps, other]

Quantized Low-Rank Multivariate Regression with Random Dithering

Authors: Junren Chen, Yueqi Wang, Michael K. Ng

Abstract: Low-rank multivariate regression (LRMR) is an important statistical learning model that combines highly correlated tasks as a multiresponse regression problem with low-rank priori on the coefficient matrix. In this paper, we study quantized LRMR, a practical setting where the responses and/or the covariates are discretized to finite precision. We focus on the estimation of the underlying coefficie… ▽ More Low-rank multivariate regression (LRMR) is an important statistical learning model that combines highly correlated tasks as a multiresponse regression problem with low-rank priori on the coefficient matrix. In this paper, we study quantized LRMR, a practical setting where the responses and/or the covariates are discretized to finite precision. We focus on the estimation of the underlying coefficient matrix. To make consistent estimator that could achieve arbitrarily small error possible, we employ uniform quantization with random dithering, i.e., we add appropriate random noise to the data before quantization. Specifically, uniform dither and triangular dither are used for responses and covariates, respectively. Based on the quantized data, we propose the constrained Lasso and regularized Lasso estimators, and derive the non-asymptotic error bounds. With the aid of dithering, the estimators achieve minimax optimal rate, while quantization only slightly worsens the multiplicative factor in the error rate. Moreover, we extend our results to a low-rank regression model with matrix responses. We corroborate and demonstrate our theoretical results via simulations on synthetic data or image restoration. △ Less

Submitted 6 October, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

Comments: IEEE Transactions on Signal Processing (publication ready version)

arXiv:2301.11272 [pdf, other]

Location-based Activity Behavior Deviation Detection for Nursing Home using IoT Devices

Authors: Billy Pik Lik Lau, Zann Koh, Yuren Zhou, Benny Kai Kiat Ng, Chau Yuen, Mui Lang Low

Abstract: With the advancement of the Internet of Things(IoT) and pervasive computing applications, it provides a better opportunity to understand the behavior of the aging population. However, in a nursing home scenario, common sensors and techniques used to track an elderly living alone are not suitable. In this paper, we design a location-based tracking system for a four-story nursing home - The Salvatio… ▽ More With the advancement of the Internet of Things(IoT) and pervasive computing applications, it provides a better opportunity to understand the behavior of the aging population. However, in a nursing home scenario, common sensors and techniques used to track an elderly living alone are not suitable. In this paper, we design a location-based tracking system for a four-story nursing home - The Salvation Army, Peacehaven Nursing Home in Singapore. The main challenge here is to identify the group activity among the nursing home's residents and to detect if they have any deviated activity behavior. We propose a location-based deviated activity behavior detection system to detect deviated activity behavior by leveraging data fusion technique. In order to compute the features for data fusion, an adaptive method is applied for extracting the group and individual activity time and generate daily hybrid norm for each of the residents. Next, deviated activity behavior detection is executed by considering the difference between daily norm patterns and daily input data for each resident. Lastly, the deviated activity behavior among the residents are classified using a rule-based classification approach. Through the implementation, there are 44.4% of the residents do not have deviated activity behavior , while 37% residents involved in one deviated activity behavior and 18.6% residents have two or more deviated activity behaviors. △ Less

Submitted 25 January, 2023; originally announced January 2023.

Comments: 12 pages

arXiv:2212.14562 [pdf, ps, other]

Quantizing Heavy-tailed Data in Statistical Estimation: (Near) Minimax Rates, Covariate Quantization, and Uniform Recovery

Authors: Junren Chen, Michael K. Ng, Di Wang

Abstract: This paper studies the quantization of heavy-tailed data in some fundamental statistical estimation problems, where the underlying distributions have bounded moments of some order. We propose to truncate and properly dither the data prior to a uniform quantization. Our major standpoint is that (near) minimax rates of estimation error are achievable merely from the quantized data produced by the pr… ▽ More This paper studies the quantization of heavy-tailed data in some fundamental statistical estimation problems, where the underlying distributions have bounded moments of some order. We propose to truncate and properly dither the data prior to a uniform quantization. Our major standpoint is that (near) minimax rates of estimation error are achievable merely from the quantized data produced by the proposed scheme. In particular, concrete results are worked out for covariance estimation, compressed sensing, and matrix completion, all agreeing that the quantization only slightly worsens the multiplicative factor. Besides, we study compressed sensing where both covariate (i.e., sensing vector) and response are quantized. Under covariate quantization, although our recovery program is non-convex because the covariance matrix estimator lacks positive semi-definiteness, all local minimizers are proved to enjoy near optimal error bound. Moreover, by the concentration inequality of product process and covering argument, we establish near minimax uniform recovery guarantee for quantized compressed sensing with heavy-tailed noise. △ Less

Submitted 26 July, 2023; v1 submitted 30 December, 2022; originally announced December 2022.

Comments: Major changes

arXiv:2208.08287 [pdf, ps, other]

Noisy Nonnegative Tucker Decomposition with Sparse Factors and Missing Data

Authors: Xiongjun Zhang, Michael K. Ng

Abstract: Tensor decomposition is a powerful tool for extracting physically meaningful latent factors from multi-dimensional nonnegative data, and has been an increasing interest in a variety of fields such as image processing, machine learning, and computer vision. In this paper, we propose a sparse nonnegative Tucker decomposition and completion method for the recovery of underlying nonnegative data under… ▽ More Tensor decomposition is a powerful tool for extracting physically meaningful latent factors from multi-dimensional nonnegative data, and has been an increasing interest in a variety of fields such as image processing, machine learning, and computer vision. In this paper, we propose a sparse nonnegative Tucker decomposition and completion method for the recovery of underlying nonnegative data under noisy observations. Here the underlying nonnegative data tensor is decomposed into a core tensor and several factor matrices with all entries being nonnegative and the factor matrices being sparse. The loss function is derived by the maximum likelihood estimation of the noisy observations, and the $\ell_0$ norm is employed to enhance the sparsity of the factor matrices. We establish the error bound of the estimator of the proposed model under generic noise scenarios, which is then specified to the observations with additive Gaussian noise, additive Laplace noise, and Poisson observations, respectively. Our theoretical results are better than those by existing tensor-based or matrix-based methods. Moreover, the minimax lower bounds are shown to be matched with the derived upper bounds up to logarithmic factors. Numerical examples on both synthetic and real-world data sets demonstrate the superiority of the proposed method for nonnegative tensor data completion. △ Less

Submitted 1 December, 2024; v1 submitted 17 August, 2022; originally announced August 2022.

arXiv:2205.13827 [pdf, ps, other]

Error Bound of Empirical $\ell_2$ Risk Minimization for Noisy Standard and Generalized Phase Retrieval Problems

Authors: Junren Chen, Michael K. Ng

Abstract: In this paper, we study the estimation performance of empirical $\ell_2$ risk minimization (ERM) in noisy (standard) phase retrieval (NPR) given by $y_k = |α_k^*x_0|^2+η_k$, or noisy generalized phase retrieval (NGPR) formulated as $y_k = x_0^*A_kx_0 + η_k$, where $x_0\in\mathbb{K}^d$ is the desired signal, $n$ is the sample size, $η= (η_1,...,η_n)^\top$ is the noise vector. We establish new error… ▽ More In this paper, we study the estimation performance of empirical $\ell_2$ risk minimization (ERM) in noisy (standard) phase retrieval (NPR) given by $y_k = |α_k^*x_0|^2+η_k$, or noisy generalized phase retrieval (NGPR) formulated as $y_k = x_0^*A_kx_0 + η_k$, where $x_0\in\mathbb{K}^d$ is the desired signal, $n$ is the sample size, $η= (η_1,...,η_n)^\top$ is the noise vector. We establish new error bounds under different noise patterns, and our proofs are valid for both $\mathbb{K}=\mathbb{R}$ and $\mathbb{K}=\mathbb{C}$. In NPR under arbitrary noise vector $η$, we derive a new error bound $O\big(\|η\|_\infty\sqrt{\frac{d}{n}} + \frac{|\mathbf{1}^\topη|}{n}\big)$, which is tighter than the currently known one $O\big(\frac{\|η\|}{\sqrt{n}}\big)$ in many cases. In NGPR, we show $O\big(\|η\|\frac{\sqrt{d}}{n}\big)$ for arbitrary $η$. In both problems, the bounds for arbitrary noise immediately give rise to $\tilde{O}(\sqrt{\frac{d}{n}})$ for sub-Gaussian or sub-exponential random noise, with some conventional but inessential assumptions (e.g., independent or zero-mean condition) removed or weakened. In addition, we make a first attempt to ERM under heavy-tailed random noise assumed to have bounded $l$-th moment. To achieve a trade-off between bias and variance, we truncate the responses and propose a corresponding robust ERM estimator, which is shown to possess the guarantee $\tilde{O}\big(\big[\sqrt{\frac{d}{n}}\big]^{1-1/l}\big)$ in both NPR, NGPR. All the error bounds straightforwardly extend to the more general problems of rank-$r$ matrix recovery, and these results deliver a conclusion that the full-rank frame $\{A_k\}_{k=1}^n$ in NGPR is more robust to biased noise than the rank-1 frame $\{α_kα_k^*\}_{k=1}^n$ in NPR. Extensive experimental results are presented to illustrate our theoretical findings. △ Less

Submitted 28 June, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

Comments: 44 pages, 6 figures

arXiv:2202.13157 [pdf, ps, other]

High Dimensional Statistical Estimation under Uniformly Dithered One-bit Quantization

Authors: Junren Chen, Cheng-Long Wang, Michael K. Ng, Di Wang

Abstract: In this paper, we propose a uniformly dithered 1-bit quantization scheme for high-dimensional statistical estimation. The scheme contains truncation, dithering, and quantization as typical steps. As canonical examples, the quantization scheme is applied to the estimation problems of sparse covariance matrix estimation, sparse linear regression (i.e., compressed sensing), and matrix completion. We… ▽ More In this paper, we propose a uniformly dithered 1-bit quantization scheme for high-dimensional statistical estimation. The scheme contains truncation, dithering, and quantization as typical steps. As canonical examples, the quantization scheme is applied to the estimation problems of sparse covariance matrix estimation, sparse linear regression (i.e., compressed sensing), and matrix completion. We study both sub-Gaussian and heavy-tailed regimes, where the underlying distribution of heavy-tailed data is assumed to have bounded moments of some order. We propose new estimators based on 1-bit quantized data. In sub-Gaussian regime, our estimators achieve near minimax rates, indicating that our quantization scheme costs very little. In heavy-tailed regime, while the rates of our estimators become essentially slower, these results are either the first ones in an 1-bit quantized and heavy-tailed setting, or already improve on existing comparable results from some respect. Under the observations in our setting, the rates are almost tight in compressed sensing and matrix completion. Our 1-bit compressed sensing results feature general sensing vector that is sub-Gaussian or even heavy-tailed. We also first investigate a novel setting where both the covariate and response are quantized. In addition, our approach to 1-bit matrix completion does not rely on likelihood and represent the first method robust to pre-quantization noise with unknown distribution. Experimental results on synthetic data are presented to support our theoretical analysis. △ Less

Submitted 20 January, 2023; v1 submitted 26 February, 2022; originally announced February 2022.

Comments: We add lower bounds for 1-bit quantization of heavy-tailed data (Theorems 11, 14)

arXiv:2109.00749 [pdf, other]

Co-Separable Nonnegative Matrix Factorization

Authors: Junjun Pan, Michael K. Ng

Abstract: Nonnegative matrix factorization (NMF) is a popular model in the field of pattern recognition. It aims to find a low rank approximation for nonnegative data M by a product of two nonnegative matrices W and H. In general, NMF is NP-hard to solve while it can be solved efficiently under separability assumption, which requires the columns of factor matrix are equal to columns of the input matrix. In… ▽ More Nonnegative matrix factorization (NMF) is a popular model in the field of pattern recognition. It aims to find a low rank approximation for nonnegative data M by a product of two nonnegative matrices W and H. In general, NMF is NP-hard to solve while it can be solved efficiently under separability assumption, which requires the columns of factor matrix are equal to columns of the input matrix. In this paper, we generalize separability assumption based on 3-factor NMF M=P_1SP_2, and require that S is a sub-matrix of the input matrix. We refer to this NMF as a Co-Separable NMF (CoS-NMF). We discuss some mathematics properties of CoS-NMF, and present the relationships with other related matrix factorizations such as CUR decomposition, generalized separable NMF(GS-NMF), and bi-orthogonal tri-factorization (BiOR-NM3F). An optimization model for CoS-NMF is proposed and alternated fast gradient method is employed to solve the model. Numerical experiments on synthetic datasets, document datasets and facial databases are conducted to verify the effectiveness of our CoS-NMF model. Compared to state-of-the-art methods, CoS-NMF model performs very well in co-clustering task, and preserves a good approximation to the input data matrix as well. △ Less

Submitted 2 September, 2021; originally announced September 2021.

arXiv:2106.06997 [pdf, other]

Post-hoc loss-calibration for Bayesian neural networks

Authors: Meet P. Vadera, Soumya Ghosh, Kenney Ng, Benjamin M. Marlin

Abstract: Bayesian decision theory provides an elegant framework for acting optimally under uncertainty when tractable posterior distributions are available. Modern Bayesian models, however, typically involve intractable posteriors that are approximated with, potentially crude, surrogates. This difficulty has engendered loss-calibrated techniques that aim to learn posterior approximations that favor high-ut… ▽ More Bayesian decision theory provides an elegant framework for acting optimally under uncertainty when tractable posterior distributions are available. Modern Bayesian models, however, typically involve intractable posteriors that are approximated with, potentially crude, surrogates. This difficulty has engendered loss-calibrated techniques that aim to learn posterior approximations that favor high-utility decisions. In this paper, focusing on Bayesian neural networks, we develop methods for correcting approximate posterior predictive distributions encouraging them to prefer high-utility decisions. In contrast to previous work, our approach is agnostic to the choice of the approximate inference algorithm, allows for efficient test time decision making through amortization, and empirically produces higher quality decisions. We demonstrate the effectiveness of our approach through controlled experiments spanning a diversity of tasks and datasets. △ Less

Submitted 13 June, 2021; originally announced June 2021.

Comments: Accepted to Conference on Uncertainty in AI (UAI) '21

arXiv:2104.05449

Current Overview of Statistical Fiber Bundles Model and Its Application to Physics-based Reliability Analysis of Thin-film Dielectrics

Authors: James U. Gleaton, David Han, James D. Lynch, Hon Keung Tony Ng, Fabrizio Ruggeri

Abstract: In this paper, we present a critical overview of statistical fiber bundles models. We discuss relevant aspects, like assumptions and consequences stemming from models in the literature and propose new ones. This is accomplished by concentrating on both the physical and statistical aspects of a specific load-sharing example, the breakdown (BD) for circuits of capacitors and related dielectrics. For… ▽ More In this paper, we present a critical overview of statistical fiber bundles models. We discuss relevant aspects, like assumptions and consequences stemming from models in the literature and propose new ones. This is accomplished by concentrating on both the physical and statistical aspects of a specific load-sharing example, the breakdown (BD) for circuits of capacitors and related dielectrics. For series and parallel/series circuits (series/parallel reliability systems) of ordinary capacitors, the load-sharing rules are derived from the electrical laws. This with the BD formalism is then used to obtain the BD distribution of the circuit. The BD distribution and Gibbs measure are given for a series circuit and the size effects are illustrated for simulations of series and parallel/series circuits. This is related to the finite weakest link adjustments for the BD distribution that arise in large series/parallel reliability load-sharing systems, such as dielectric BD, from their extreme value approximations. An elementary but in-depth discussion of the physical aspects of SiO$_2$ and HfO$_2$ dielectrics and cell models is given. This is used to study a load-sharing cell model for the BD of HfO$_2$ dielectrics and the BD formalism. The latter study is based on an analysis of Kim and Lee (2004)'s data for such dielectrics. Here, several BD distributions are compared in the analysis and proportional hazard regression models are used to study the BD formalism. In addition, some areas of open research are discussed. △ Less

Submitted 25 January, 2023; v1 submitted 9 April, 2021; originally announced April 2021.

Comments: The majority of the materials in the paper has been published as a book

arXiv:2103.10060 [pdf, other]

Approximating Probability Distributions by using Wasserstein Generative Adversarial Networks

Authors: Yihang Gao, Michael K. Ng, Mingjie Zhou

Abstract: Studied here are Wasserstein generative adversarial networks (WGANs) with GroupSort neural networks as their discriminators. It is shown that the error bound of the approximation for the target distribution depends on the width and depth (capacity) of the generators and discriminators and the number of samples in training. A quantified generalization bound is established for the Wasserstein distan… ▽ More Studied here are Wasserstein generative adversarial networks (WGANs) with GroupSort neural networks as their discriminators. It is shown that the error bound of the approximation for the target distribution depends on the width and depth (capacity) of the generators and discriminators and the number of samples in training. A quantified generalization bound is established for the Wasserstein distance between the generated and target distributions. According to the theoretical results, WGANs have a higher requirement for the capacity of discriminators than that of generators, which is consistent with some existing results. More importantly, the results with overly deep and wide (high-capacity) generators may be worse than those with low-capacity generators if discriminators are insufficiently strong. Numerical results obtained using Swiss roll and MNIST datasets confirm the theoretical results. △ Less

Submitted 29 June, 2023; v1 submitted 18 March, 2021; originally announced March 2021.

Comments: Accepted by SIAM Journal on Mathematics of Data Science (SIMODS)

MSC Class: 68Q32; 68T15; 68W40

arXiv:2101.03725 [pdf, other]

doi 10.1109/JIOT.2021.3051343

The Study of Urban Residential's Public Space Activeness using Space-centric Approach

Authors: Billy Pik Lik Lau, Benny Kai Kiat Ng, Chau Yuen, Bige Tuncer, Keng Hua Chong

Abstract: With the advancement of the Internet of Things (IoT) and communication platform, large scale sensor deployment can be easily implemented in an urban city to collect various information. To date, there are only a handful of research studies about understanding the usage of urban public spaces. Leveraging IoT, various sensors have been deployed in an urban residential area to monitor and study publi… ▽ More With the advancement of the Internet of Things (IoT) and communication platform, large scale sensor deployment can be easily implemented in an urban city to collect various information. To date, there are only a handful of research studies about understanding the usage of urban public spaces. Leveraging IoT, various sensors have been deployed in an urban residential area to monitor and study public space utilization patterns. In this paper, we propose a data processing system to generate space-centric insights about the utilization of an urban residential region of multiple points of interest (PoIs) that consists of 190,000m$^2$ real estate. We identify the activeness of each PoI based on the spectral clustering, and then study their corresponding static features, which are composed of transportation, commercial facilities, population density, along with other characteristics. Through the heuristic features inferring, the residential density and commercial facilities are the most significant factors affecting public place utilization. △ Less

Submitted 11 January, 2021; v1 submitted 11 January, 2021; originally announced January 2021.

Comments: Accepted at IEEE Internet of Things Journal 2021

arXiv:2012.08784 [pdf, other]

Tensor Completion by Multi-Rank via Unitary Transformation

Authors: Guang-Jing Song, Michael K. Ng, Xiongjun Zhang

Abstract: One of the key problems in tensor completion is the number of uniformly random sample entries required for recovery guarantee. The main aim of this paper is to study $n_1 \times n_2 \times n_3$ third-order tensor completion based on transformed tensor singular value decomposition, and provide a bound on the number of required sample entries. Our approach is to make use of the multi-rank of the und… ▽ More One of the key problems in tensor completion is the number of uniformly random sample entries required for recovery guarantee. The main aim of this paper is to study $n_1 \times n_2 \times n_3$ third-order tensor completion based on transformed tensor singular value decomposition, and provide a bound on the number of required sample entries. Our approach is to make use of the multi-rank of the underlying tensor instead of its tubal rank in the bound. In numerical experiments on synthetic and imaging data sets, we demonstrate the effectiveness of our proposed bound for the number of sample entries. Moreover, our theoretical results are valid to any unitary transformation applied to $n_3$-dimension under transformed tensor singular value decomposition. △ Less

Submitted 24 January, 2022; v1 submitted 16 December, 2020; originally announced December 2020.

arXiv:2012.05488 [pdf, other]

doi 10.1109/JSYST.2020.3044325

Urban Space Insights Extraction using Acoustic Histogram Information

Authors: Nipun Wijerathne, Billy Pik Lik Lau, Benny Kai Kiat Ng, Chau Yuen

Abstract: Urban data mining can be identified as a highly potential area that can enhance the smart city services towards better sustainable development especially in the urban residential activity tracking. While existing human activity tracking systems have demonstrated the capability to unveil the hidden aspects of citizens' behavior, they often come with a high implementation cost and require a large co… ▽ More Urban data mining can be identified as a highly potential area that can enhance the smart city services towards better sustainable development especially in the urban residential activity tracking. While existing human activity tracking systems have demonstrated the capability to unveil the hidden aspects of citizens' behavior, they often come with a high implementation cost and require a large communication bandwidth. In this paper, we study the implementation of low-cost analogue sound sensors to detect outdoor activities and estimate the raining period in an urban residential area. The analogue sound sensors are transmitted to the cloud every 5 minutes in histogram format, which consists of sound data sampled every 100ms (10Hz). We then use wavelet transformation (WT) and principal component analysis (PCA) to generate a more robust and consistent feature set from the histogram. After that, we performed unsupervised clustering and attempt to understand the individual characteristics of each cluster to identify outdoor residential activities. In addition, on-site validation has been conducted to show the effectiveness of our approach. △ Less

Submitted 14 December, 2020; v1 submitted 10 December, 2020; originally announced December 2020.

Comments: Accepted at IEEE Systems Journal

arXiv:2009.03998 [pdf, other]

Tangent Space Based Alternating Projections for Nonnegative Low Rank Matrix Approximation

Authors: Guangjing Song, Michael K. Ng, Tai-Xiang Jiang

Abstract: In this paper, we develop a new alternating projection method to compute nonnegative low rank matrix approximation for nonnegative matrices. In the nonnegative low rank matrix approximation method, the projection onto the manifold of fixed rank matrices can be expensive as the singular value decomposition is required. We propose to use the tangent space of the point in the manifold to approximate… ▽ More In this paper, we develop a new alternating projection method to compute nonnegative low rank matrix approximation for nonnegative matrices. In the nonnegative low rank matrix approximation method, the projection onto the manifold of fixed rank matrices can be expensive as the singular value decomposition is required. We propose to use the tangent space of the point in the manifold to approximate the projection onto the manifold in order to reduce the computational cost. We show that the sequence generated by the alternating projections onto the tangent spaces of the fixed rank matrices manifold and the nonnegative matrix manifold, converge linearly to a point in the intersection of the two manifolds where the convergent point is sufficiently close to optimal solutions. This convergence result based inexact projection onto the manifold is new and is not studied in the literature. Numerical examples in data clustering, pattern recognition and hyperspectral data analysis are given to demonstrate that the performance of the proposed method is better than that of nonnegative matrix factorization methods in terms of computational time and accuracy. △ Less

Submitted 2 September, 2020; originally announced September 2020.

arXiv:2008.00748 [pdf, other]

Tensorizing GAN with High-Order Pooling for Alzheimer's Disease Assessment

Authors: Wen Yu, Baiying Lei, Michael K. Ng, Albert C. Cheung, Yanyan Shen, Shuqiang Wang

Abstract: It is of great significance to apply deep learning for the early diagnosis of Alzheimer's Disease (AD). In this work, a novel tensorizing GAN with high-order pooling is proposed to assess Mild Cognitive Impairment (MCI) and AD. By tensorizing a three-player cooperative game based framework, the proposed model can benefit from the structural information of the brain. By incorporating the high-order… ▽ More It is of great significance to apply deep learning for the early diagnosis of Alzheimer's Disease (AD). In this work, a novel tensorizing GAN with high-order pooling is proposed to assess Mild Cognitive Impairment (MCI) and AD. By tensorizing a three-player cooperative game based framework, the proposed model can benefit from the structural information of the brain. By incorporating the high-order pooling scheme into the classifier, the proposed model can make full use of the second-order statistics of the holistic Magnetic Resonance Imaging (MRI) images. To the best of our knowledge, the proposed Tensor-train, High-pooling and Semi-supervised learning based GAN (THS-GAN) is the first work to deal with classification on MRI images for AD diagnosis. Extensive experimental results on Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset are reported to demonstrate that the proposed THS-GAN achieves superior performance compared with existing methods, and to show that both tensor-train and high-order pooling can enhance classification performance. The visualization of generated samples also shows that the proposed model can generate plausible samples for semi-supervised learning purpose. △ Less

Submitted 3 August, 2020; originally announced August 2020.

Comments: 15 pages, 20 figures

arXiv:2007.12355 [pdf, other]

Dynamic Knowledge Distillation for Black-box Hypothesis Transfer Learning

Authors: Yiqin Yu, Xu Min, Shiwan Zhao, Jing Mei, Fei Wang, Dongsheng Li, Kenney Ng, Shaochun Li

Abstract: In real world applications like healthcare, it is usually difficult to build a machine learning prediction model that works universally well across different institutions. At the same time, the available model is often proprietary, i.e., neither the model parameter nor the data set used for model training is accessible. In consequence, leveraging the knowledge hidden in the available model (aka. t… ▽ More In real world applications like healthcare, it is usually difficult to build a machine learning prediction model that works universally well across different institutions. At the same time, the available model is often proprietary, i.e., neither the model parameter nor the data set used for model training is accessible. In consequence, leveraging the knowledge hidden in the available model (aka. the hypothesis) and adapting it to a local data set becomes extremely challenging. Motivated by this situation, in this paper we aim to address such a specific case within the hypothesis transfer learning framework, in which 1) the source hypothesis is a black-box model and 2) the source domain data is unavailable. In particular, we introduce a novel algorithm called dynamic knowledge distillation for hypothesis transfer learning (dkdHTL). In this method, we use knowledge distillation with instance-wise weighting mechanism to adaptively transfer the "dark" knowledge from the source hypothesis to the target domain.The weighting coefficients of the distillation loss and the standard loss are determined by the consistency between the predicted probability of the source hypothesis and the target ground-truth label.Empirical results on both transfer learning benchmark datasets and a healthcare dataset demonstrate the effectiveness of our method. △ Less

Submitted 6 August, 2020; v1 submitted 24 July, 2020; originally announced July 2020.

Comments: 7 pages, 2 figures

arXiv:2007.10626 [pdf, other]

Sparse Nonnegative Tensor Factorization and Completion with Noisy Observations

Authors: Xiongjun Zhang, Michael K. Ng

Abstract: In this paper, we study the sparse nonnegative tensor factorization and completion problem from partial and noisy observations for third-order tensors. Because of sparsity and nonnegativity, the underlying tensor is decomposed into the tensor-tensor product of one sparse nonnegative tensor and one nonnegative tensor. We propose to minimize the sum of the maximum likelihood estimation for the obser… ▽ More In this paper, we study the sparse nonnegative tensor factorization and completion problem from partial and noisy observations for third-order tensors. Because of sparsity and nonnegativity, the underlying tensor is decomposed into the tensor-tensor product of one sparse nonnegative tensor and one nonnegative tensor. We propose to minimize the sum of the maximum likelihood estimation for the observations with nonnegativity constraints and the tensor $\ell_0$ norm for the sparse factor. We show that the error bounds of the estimator of the proposed model can be established under general noise observations. The detailed error bounds under specific noise distributions including additive Gaussian noise, additive Laplace noise, and Poisson observations can be derived. Moreover, the minimax lower bounds are shown to be matched with the established upper bounds up to a logarithmic factor of the sizes of the underlying tensor. These theoretical results for tensors are better than those obtained for matrices, and this illustrates the advantage of the use of nonnegative sparse tensor models for completion and denoising. Numerical experiments are provided to validate the superiority of the proposed tensor-based method compared with the matrix-based approach. △ Less

Submitted 20 October, 2021; v1 submitted 21 July, 2020; originally announced July 2020.

arXiv:2006.11601 [pdf, other]

Rethinking Privacy Preserving Deep Learning: How to Evaluate and Thwart Privacy Attacks

Authors: Lixin Fan, Kam Woh Ng, Ce Ju, Tianyu Zhang, Chang Liu, Chee Seng Chan, Qiang Yang

Abstract: This paper investigates capabilities of Privacy-Preserving Deep Learning (PPDL) mechanisms against various forms of privacy attacks. First, we propose to quantitatively measure the trade-off between model accuracy and privacy losses incurred by reconstruction, tracing and membership attacks. Second, we formulate reconstruction attacks as solving a noisy system of linear equations, and prove that a… ▽ More This paper investigates capabilities of Privacy-Preserving Deep Learning (PPDL) mechanisms against various forms of privacy attacks. First, we propose to quantitatively measure the trade-off between model accuracy and privacy losses incurred by reconstruction, tracing and membership attacks. Second, we formulate reconstruction attacks as solving a noisy system of linear equations, and prove that attacks are guaranteed to be defeated if condition (2) is unfulfilled. Third, based on theoretical analysis, a novel Secret Polarization Network (SPN) is proposed to thwart privacy attacks, which pose serious challenges to existing PPDL methods. Extensive experiments showed that model accuracies are improved on average by 5-20% compared with baseline mechanisms, in regimes where data privacy are satisfactorily protected. △ Less

Submitted 23 June, 2020; v1 submitted 20 June, 2020; originally announced June 2020.

Comments: under review, 36 pages (updated Eq. 3 and Fig. 8)

arXiv:2002.04401 [pdf, other]

Understanding Crowd Behaviors in a Social Event by Passive WiFi Sensing and Data Mining

Authors: Yuren Zhou, Billy Pik Lik Lau, Zann Koh, Chau Yuen, Benny Kai Kiat Ng

Abstract: Understanding crowd behaviors in a large social event is crucial for event management. Passive WiFi sensing, by collecting WiFi probe requests sent from mobile devices, provides a better way to monitor crowds compared with people counters and cameras in terms of free interference, larger coverage, lower cost, and more information on people's movement. In existing studies, however, not enough atten… ▽ More Understanding crowd behaviors in a large social event is crucial for event management. Passive WiFi sensing, by collecting WiFi probe requests sent from mobile devices, provides a better way to monitor crowds compared with people counters and cameras in terms of free interference, larger coverage, lower cost, and more information on people's movement. In existing studies, however, not enough attention has been paid to the thorough analysis and mining of collected data. Especially, the power of machine learning has not been fully exploited. In this paper, therefore, we propose a comprehensive data analysis framework to fully analyze the collected probe requests to extract three types of patterns related to crowd behaviors in a large social event, with the help of statistics, visualization, and unsupervised machine learning. First, trajectories of the mobile devices are extracted from probe requests and analyzed to reveal the spatial patterns of the crowds' movement. Hierarchical agglomerative clustering is adopted to find the interconnections between different locations. Next, k-means and k-shape clustering algorithms are applied to extract temporal visiting patterns of the crowds by days and locations, respectively. Finally, by combining with time, trajectories are transformed into spatiotemporal patterns, which reveal how trajectory duration changes over the length and how the overall trends of crowd movement change over time. The proposed data analysis framework is fully demonstrated using real-world data collected in a large social event. Results show that one can extract comprehensive patterns from data collected by a network of passive WiFi sensors. △ Less

Submitted 4 February, 2020; originally announced February 2020.

Comments: This manuscript has been accepted by IEEE Internet of Things journal. Copyright (c) 2020 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]

arXiv:1911.04655 [pdf, other]

Hyper-Sphere Quantization: Communication-Efficient SGD for Federated Learning

Authors: Xinyan Dai, Xiao Yan, Kaiwen Zhou, Han Yang, Kelvin K. W. Ng, James Cheng, Yu Fan

Abstract: The high cost of communicating gradients is a major bottleneck for federated learning, as the bandwidth of the participating user devices is limited. Existing gradient compression algorithms are mainly designed for data centers with high-speed network and achieve $O(\sqrt{d} \log d)$ per-iteration communication cost at best, where $d$ is the size of the model. We propose hyper-sphere quantization… ▽ More The high cost of communicating gradients is a major bottleneck for federated learning, as the bandwidth of the participating user devices is limited. Existing gradient compression algorithms are mainly designed for data centers with high-speed network and achieve $O(\sqrt{d} \log d)$ per-iteration communication cost at best, where $d$ is the size of the model. We propose hyper-sphere quantization (HSQ), a general framework that can be configured to achieve a continuum of trade-offs between communication efficiency and gradient accuracy. In particular, at the high compression ratio end, HSQ provides a low per-iteration communication cost of $O(\log d)$, which is favorable for federated learning. We prove the convergence of HSQ theoretically and show by experiments that HSQ significantly reduces the communication cost of model training without hurting convergence accuracy. △ Less

Submitted 25 November, 2019; v1 submitted 11 November, 2019; originally announced November 2019.

arXiv:1910.09979 [pdf, other]

Orthogonal Nonnegative Tucker Decomposition

Authors: Junjun Pan, Michael K. Ng, Ye Liu, Xiongjun Zhang, Hong Yan

Abstract: In this paper, we study the nonnegative tensor data and propose an orthogonal nonnegative Tucker decomposition (ONTD). We discuss some properties of ONTD and develop a convex relaxation algorithm of the augmented Lagrangian function to solve the optimization problem. The convergence of the algorithm is given. We employ ONTD on the image data sets from the real world applications including face rec… ▽ More In this paper, we study the nonnegative tensor data and propose an orthogonal nonnegative Tucker decomposition (ONTD). We discuss some properties of ONTD and develop a convex relaxation algorithm of the augmented Lagrangian function to solve the optimization problem. The convergence of the algorithm is given. We employ ONTD on the image data sets from the real world applications including face recognition, image representation, hyperspectral unmixing. Numerical results are shown to illustrate the effectiveness of the proposed algorithm. △ Less

Submitted 27 October, 2019; v1 submitted 21 October, 2019; originally announced October 2019.

arXiv:1909.10679 [pdf]

Structural Change Analysis of Active Cryptocurrency Market

Authors: C. Y. Tan, Y. B. Koh, K. H. Ng, K. H. Ng

Abstract: Structural Change Analysis of Active Cryptocurrency Market Structural Change Analysis of Active Cryptocurrency Market △ Less

Submitted 23 September, 2019; originally announced September 2019.

Comments: 18 pages, 6 figures and 3 tables

arXiv:1907.01113 [pdf, other]

Robust Tensor Completion Using Transformed Tensor SVD

Authors: Guangjing Song, Michael K. Ng, Xiongjun Zhang

Abstract: In this paper, we study robust tensor completion by using transformed tensor singular value decomposition (SVD), which employs unitary transform matrices instead of discrete Fourier transform matrix that is used in the traditional tensor SVD. The main motivation is that a lower tubal rank tensor can be obtained by using other unitary transform matrices than that by using discrete Fourier transform… ▽ More In this paper, we study robust tensor completion by using transformed tensor singular value decomposition (SVD), which employs unitary transform matrices instead of discrete Fourier transform matrix that is used in the traditional tensor SVD. The main motivation is that a lower tubal rank tensor can be obtained by using other unitary transform matrices than that by using discrete Fourier transform matrix. This would be more effective for robust tensor completion. Experimental results for hyperspectral, video and face datasets have shown that the recovery performance for the robust tensor completion problem by using transformed tensor SVD is better in PSNR than that by using Fourier transform and other robust tensor completion methods. △ Less

Submitted 1 July, 2019; originally announced July 2019.

arXiv:1906.01167 [pdf, other]

Towards Fair and Privacy-Preserving Federated Deep Models

Authors: Lingjuan Lyu, Jiangshan Yu, Karthik Nandakumar, Yitong Li, Xingjun Ma, Jiong Jin, Han Yu, Kee Siong Ng

Abstract: The current standalone deep learning framework tends to result in overfitting and low utility. This problem can be addressed by either a centralized framework that deploys a central server to train a global model on the joint data from all parties, or a distributed framework that leverages a parameter server to aggregate local model updates. Server-based solutions are prone to the problem of a sin… ▽ More The current standalone deep learning framework tends to result in overfitting and low utility. This problem can be addressed by either a centralized framework that deploys a central server to train a global model on the joint data from all parties, or a distributed framework that leverages a parameter server to aggregate local model updates. Server-based solutions are prone to the problem of a single-point-of-failure. In this respect, collaborative learning frameworks, such as federated learning (FL), are more robust. Existing federated learning frameworks overlook an important aspect of participation: fairness. All parties are given the same final model without regard to their contributions. To address these issues, we propose a decentralized Fair and Privacy-Preserving Deep Learning (FPPDL) framework to incorporate fairness into federated deep learning models. In particular, we design a local credibility mutual evaluation mechanism to guarantee fairness, and a three-layer onion-style encryption scheme to guarantee both accuracy and privacy. Different from existing FL paradigm, under FPPDL, each participant receives a different version of the FL model with performance commensurate with his contributions. Experiments on benchmark datasets demonstrate that FPPDL balances fairness, privacy and accuracy. It enables federated learning ecosystems to detect and isolate low-contribution parties, thereby promoting responsible participation. △ Less

Submitted 19 May, 2020; v1 submitted 3 June, 2019; originally announced June 2019.

Comments: Accepted for publication in TPDS

arXiv:1904.11652 [pdf, other]

doi 10.1109/TVCG.2020.2985689

DPVis: Visual Analytics with Hidden Markov Models for Disease Progression Pathways

Authors: Bum Chul Kwon, Vibha Anand, Kristen A Severson, Soumya Ghosh, Zhaonan Sun, Brigitte I Frohnert, Markus Lundgren, Kenney Ng

Abstract: Clinical researchers use disease progression models to understand patient status and characterize progression patterns from longitudinal health records. One approach for disease progression modeling is to describe patient status using a small number of states that represent distinctive distributions over a set of observed measures. Hidden Markov models (HMMs) and its variants are a class of models… ▽ More Clinical researchers use disease progression models to understand patient status and characterize progression patterns from longitudinal health records. One approach for disease progression modeling is to describe patient status using a small number of states that represent distinctive distributions over a set of observed measures. Hidden Markov models (HMMs) and its variants are a class of models that both discover these states and make inferences of health states for patients. Despite the advantages of using the algorithms for discovering interesting patterns, it still remains challenging for medical experts to interpret model outputs, understand complex modeling parameters, and clinically make sense of the patterns. To tackle these problems, we conducted a design study with clinical scientists, statisticians, and visualization experts, with the goal to investigate disease progression pathways of chronic diseases, namely type 1 diabetes (T1D), Huntington's disease, Parkinson's disease, and chronic obstructive pulmonary disease (COPD). As a result, we introduce DPVis which seamlessly integrates model parameters and outcomes of HMMs into interpretable and interactive visualizations. In this study, we demonstrate that DPVis is successful in evaluating disease progression models, visually summarizing disease states, interactively exploring disease progression patterns, and building, analyzing, and comparing clinically relevant patient subgroups. △ Less

Submitted 9 April, 2020; v1 submitted 25 April, 2019; originally announced April 2019.

Comments: to appear at IEEE Transactions on Visualization and Computer Graphics

arXiv:1904.10126 [pdf]

Lung Nodule Classification using Deep Local-Global Networks

Authors: Mundher Al-Shabi, Boon Leong Lan, Wai Yee Chan, Kwan-Hoong Ng, Maxine Tan

Abstract: Purpose: Lung nodules have very diverse shapes and sizes, which makes classifying them as benign/malignant a challenging problem. In this paper, we propose a novel method to predict the malignancy of nodules that have the capability to analyze the shape and size of a nodule using a global feature extractor, as well as the density and structure of the nodule using a local feature extractor. Methods… ▽ More Purpose: Lung nodules have very diverse shapes and sizes, which makes classifying them as benign/malignant a challenging problem. In this paper, we propose a novel method to predict the malignancy of nodules that have the capability to analyze the shape and size of a nodule using a global feature extractor, as well as the density and structure of the nodule using a local feature extractor. Methods: We propose to use Residual Blocks with a 3x3 kernel size for local feature extraction, and Non-Local Blocks to extract the global features. The Non-Local Block has the ability to extract global features without using a huge number of parameters. The key idea behind the Non-Local Block is to apply matrix multiplications between features on the same feature maps. Results: We trained and validated the proposed method on the LIDC-IDRI dataset which contains 1,018 computed tomography (CT) scans. We followed a rigorous procedure for experimental setup namely, 10-fold cross-validation and ignored the nodules that had been annotated by less than 3 radiologists. The proposed method achieved state-of-the-art results with AUC=95.62%, while significantly outperforming other baseline methods. Conclusions: Our proposed Deep Local-Global network has the capability to accurately extract both local and global features. Our new method outperforms state-of-the-art architecture including Densenet and Resnet with transfer learning. △ Less

Submitted 22 April, 2019; originally announced April 2019.

Comments: Code and dataset available here https://github.com/mundher/local-global

arXiv:1902.10393 [pdf, other]

doi 10.1214/20-BA1204

Using prior expansions for prior-data conflict checking

Authors: David J. Nott, Max Seah, Luai Al-Labadi, Michael Evans, Hui Khoon Ng, Berthold-Georg Englert

Abstract: Any Bayesian analysis involves combining information represented through different model components, and when different sources of information are in conflict it is important to detect this. Here we consider checking for prior-data conflict in Bayesian models by expanding the prior used for the analysis into a larger family of priors, and considering a marginal likelihood score statistic for the e… ▽ More Any Bayesian analysis involves combining information represented through different model components, and when different sources of information are in conflict it is important to detect this. Here we consider checking for prior-data conflict in Bayesian models by expanding the prior used for the analysis into a larger family of priors, and considering a marginal likelihood score statistic for the expansion parameter. Consideration of different expansions can be informative about the nature of any conflict, and extensions to hierarchically specified priors and connections with other approaches to prior-data conflict checking are discussed. Implementation in complex situations is illustrated with two applications. The first concerns testing for the appropriateness of a LASSO penalty in shrinkage estimation of coefficients in linear regression. Our method is compared with a recent suggestion in the literature designed to be powerful against alternatives in the exponential power family, and we use this family as the prior expansion for constructing our check. A second application concerns a problem in quantum state estimation, where a multinomial model is considered with physical constraints on the model parameters. In this example, the usefulness of different prior expansions is demonstrated for obtaining checks which are sensitive to different aspects of the prior. △ Less

Submitted 12 March, 2020; v1 submitted 27 February, 2019; originally announced February 2019.

Comments: Accepted version, to appear in Bayesian Analysis

arXiv:1901.08551 [pdf]

A Universal Logic Operator for Interpretable Deep Convolution Networks

Authors: KamWoh Ng, Lixin Fan, Chee Seng Chan

Abstract: Explaining neural network computation in terms of probabilistic/fuzzy logical operations has attracted much attention due to its simplicity and high interpretability. Different choices of logical operators such as AND, OR and XOR give rise to another dimension for network optimization, and in this paper, we study the open problem of learning a universal logical operator without prescribing to any… ▽ More Explaining neural network computation in terms of probabilistic/fuzzy logical operations has attracted much attention due to its simplicity and high interpretability. Different choices of logical operators such as AND, OR and XOR give rise to another dimension for network optimization, and in this paper, we study the open problem of learning a universal logical operator without prescribing to any logical operations manually. Insightful observations along this exploration furnish deep convolution networks with a novel logical interpretation. △ Less

Submitted 20 January, 2019; originally announced January 2019.

Comments: In AAAI-19 Workshop on Network Interpretability for Deep Learning

arXiv:1811.06094 [pdf, other]

Unsupervised learning with contrastive latent variable models

Authors: Kristen Severson, Soumya Ghosh, Kenney Ng

Abstract: In unsupervised learning, dimensionality reduction is an important tool for data exploration and visualization. Because these aims are typically open-ended, it can be useful to frame the problem as looking for patterns that are enriched in one dataset relative to another. These pairs of datasets occur commonly, for instance a population of interest vs. control or signal vs. signal free recordings.… ▽ More In unsupervised learning, dimensionality reduction is an important tool for data exploration and visualization. Because these aims are typically open-ended, it can be useful to frame the problem as looking for patterns that are enriched in one dataset relative to another. These pairs of datasets occur commonly, for instance a population of interest vs. control or signal vs. signal free recordings.However, there are few methods that work on sets of data as opposed to data points or sequences. Here, we present a probabilistic model for dimensionality reduction to discover signal that is enriched in the target dataset relative to the background dataset. The data in these sets do not need to be paired or grouped beyond set membership. By using a probabilistic model where some structure is shared amongst the two datasets and some is unique to the target dataset, we are able to recover interesting structure in the latent space of the target dataset. The method also has the advantages of a probabilistic model, namely that it allows for the incorporation of prior information, handles missing data, and can be generalized to different distributional assumptions. We describe several possible variations of the model and demonstrate the application of the technique to de-noising, feature selection, and subgroup discovery settings. △ Less

Submitted 14 November, 2018; originally announced November 2018.

arXiv:1811.01506 [pdf, other]

Theoretical and Experimental Analysis on the Generalizability of Distribution Regression Network

Authors: Connie Kou, Hwee Kuan Lee, Jorge Sanz, Teck Khim Ng

Abstract: There is emerging interest in performing regression between distributions. In contrast to prediction on single instances, these machine learning methods can be useful for population-based studies or on problems that are inherently statistical in nature. The recently proposed distribution regression network (DRN) has shown superior performance for the distribution-to-distribution regression task co… ▽ More There is emerging interest in performing regression between distributions. In contrast to prediction on single instances, these machine learning methods can be useful for population-based studies or on problems that are inherently statistical in nature. The recently proposed distribution regression network (DRN) has shown superior performance for the distribution-to-distribution regression task compared to conventional neural networks. However, in Kou et al. (2018) and some other works on distribution regression, there is a lack of comprehensive comparative study on both theoretical basis and generalization abilities of the methods. We derive some mathematical properties of DRN and qualitatively compare it to conventional neural networks. We also perform comprehensive experiments to study the generalizability of distribution regression models, by studying their robustness to limited training data, data sampling noise and task difficulty. DRN consistently outperforms conventional neural networks, requiring fewer training data and maintaining robust performance with noise. Furthermore, the theoretical properties of DRN can be used to provide some explanation on the ability of DRN to achieve better generalization performance than conventional neural networks. △ Less

Submitted 31 May, 2019; v1 submitted 4 November, 2018; originally announced November 2018.

arXiv:1810.01072 [pdf, other]

doi 10.1016/j.csda.2019.03.011

A flexible sequential Monte Carlo algorithm for parametric constrained regression

Authors: Kenyon Ng, Berwin A. Turlach, Kevin Murray

Abstract: An algorithm is proposed that enables the imposition of shape constraints on regression curves, without requiring the constraints to be written as closed-form expressions, nor assuming the functional form of the loss function. This algorithm is based on Sequential Monte Carlo-Simulated Annealing and only relies on an indicator function that assesses whether or not the constraints are fulfilled, th… ▽ More An algorithm is proposed that enables the imposition of shape constraints on regression curves, without requiring the constraints to be written as closed-form expressions, nor assuming the functional form of the loss function. This algorithm is based on Sequential Monte Carlo-Simulated Annealing and only relies on an indicator function that assesses whether or not the constraints are fulfilled, thus allowing the enforcement of various complex constraints by specifying an appropriate indicator function without altering other parts of the algorithm. The algorithm is illustrated by fitting rational function and B-spline regression models subject to a monotonicity constraint. An implementation of the algorithm using R is freely available on GitHub. △ Less

Submitted 1 April, 2019; v1 submitted 2 October, 2018; originally announced October 2018.

Comments: Typo corrections. Code available on https://github.com/weiyaw/blackbox

Journal ref: Computational Statistics & Data Analysis 138 (2019) 13-26

arXiv:1804.04775 [pdf, other]

A Compact Network Learning Model for Distribution Regression

Authors: Connie Kou, Hwee Kuan Lee, Teck Khim Ng

Abstract: Despite the superior performance of deep learning in many applications, challenges remain in the area of regression on function spaces. In particular, neural networks are unable to encode function inputs compactly as each node encodes just a real value. We propose a novel idea to address this shortcoming: to encode an entire function in a single network node. To that end, we design a compact netwo… ▽ More Despite the superior performance of deep learning in many applications, challenges remain in the area of regression on function spaces. In particular, neural networks are unable to encode function inputs compactly as each node encodes just a real value. We propose a novel idea to address this shortcoming: to encode an entire function in a single network node. To that end, we design a compact network representation that encodes and propagates functions in single nodes for the distribution regression task. Our proposed Distribution Regression Network (DRN) achieves higher prediction accuracies while being much more compact and uses fewer parameters than traditional neural networks. △ Less

Submitted 10 July, 2018; v1 submitted 12 April, 2018; originally announced April 2018.

arXiv:1804.02655 [pdf, ps, other]

Efficient Computational Algorithm for Optimal Continuous Experimental Designs

Authors: Jiangtao Duan, Wei Gao, Hon Keung Tony Ng

Abstract: A simple yet efficient computational algorithm for computing the continuous optimal experimental design for linear models is proposed. An alternative proof the monotonic convergence for $D$-optimal criterion on continuous design spaces are provided. We further show that the proposed algorithm converges to the $D$-optimal design. We also provide an algorithm for the $A$-optimality and conjecture th… ▽ More A simple yet efficient computational algorithm for computing the continuous optimal experimental design for linear models is proposed. An alternative proof the monotonic convergence for $D$-optimal criterion on continuous design spaces are provided. We further show that the proposed algorithm converges to the $D$-optimal design. We also provide an algorithm for the $A$-optimality and conjecture that the algorithm convergence monotonically on continuous design spaces. Different numerical examples are used to demonstrated the usefulness and performance of the proposed algorithms. △ Less

Submitted 8 April, 2018; originally announced April 2018.

arXiv:1804.01957 [pdf, ps, other]

A Class of Skewed Distributions with Applications in Environmental Data

Authors: Indranil Ghosh, Hon Keung Tony Ng

Abstract: In environmental studies, many data are typically skewed and it is desired to have a flexible statistical model for this kind of data. In this paper, we study a class of skewed distributions by invoking arguments as described by Ferreira and Steel (2006, Journal of the American Statistical Association, 101: 823--829). In particular, we consider using the logistic kernel to derive a class of univar… ▽ More In environmental studies, many data are typically skewed and it is desired to have a flexible statistical model for this kind of data. In this paper, we study a class of skewed distributions by invoking arguments as described by Ferreira and Steel (2006, Journal of the American Statistical Association, 101: 823--829). In particular, we consider using the logistic kernel to derive a class of univariate distribution called the truncated-logistic skew symmetric (TLSS) distribution. We provide some structural properties of the proposed distribution and develop the statistical inference for the TLSS distribution. A simulation study is conducted to investigate the efficacy of the maximum likelihood method. For illustrative purposes, two real data sets from environmental studies are used to exhibit the applicability of such a model. △ Less

Submitted 5 April, 2018; originally announced April 2018.

MSC Class: 60E; 62F

arXiv:1802.09933 [pdf, ps, other]

Guaranteed Sufficient Decrease for Stochastic Variance Reduced Gradient Optimization

Authors: Fanhua Shang, Yuanyuan Liu, Kaiwen Zhou, James Cheng, Kelvin K. W. Ng, Yuichi Yoshida

Abstract: In this paper, we propose a novel sufficient decrease technique for stochastic variance reduced gradient descent methods such as SVRG and SAGA. In order to make sufficient decrease for stochastic optimization, we design a new sufficient decrease criterion, which yields sufficient decrease versions of stochastic variance reduction algorithms such as SVRG-SD and SAGA-SD as a byproduct. We introduce… ▽ More In this paper, we propose a novel sufficient decrease technique for stochastic variance reduced gradient descent methods such as SVRG and SAGA. In order to make sufficient decrease for stochastic optimization, we design a new sufficient decrease criterion, which yields sufficient decrease versions of stochastic variance reduction algorithms such as SVRG-SD and SAGA-SD as a byproduct. We introduce a coefficient to scale current iterate and to satisfy the sufficient decrease property, which takes the decisions to shrink, expand or even move in the opposite direction, and then give two specific update rules of the coefficient for Lasso and ridge regression. Moreover, we analyze the convergence properties of our algorithms for strongly convex problems, which show that our algorithms attain linear convergence rates. We also provide the convergence guarantees of our algorithms for non-strongly convex problems. Our experimental results further verify that our algorithms achieve significantly better performance than their counterparts. △ Less

Submitted 25 February, 2018; originally announced February 2018.

Comments: 24 pages, 10 figures, AISTATS 2018. arXiv admin note: text overlap with arXiv:1703.06807

arXiv:1802.06476 [pdf, other]

doi 10.1109/TKDE.2019.2904060

Simultaneous Modeling of Multiple Complications for Risk Profiling in Diabetes Care

Authors: Bin Liu, Ying Li, Soumya Ghosh, Zhaonan Sun, Kenney Ng, Jianying Hu

Abstract: Type 2 diabetes mellitus (T2DM) is a chronic disease that often results in multiple complications. Risk prediction and profiling of T2DM complications is critical for healthcare professionals to design personalized treatment plans for patients in diabetes care for improved outcomes. In this paper, we study the risk of developing complications after the initial T2DM diagnosis from longitudinal pati… ▽ More Type 2 diabetes mellitus (T2DM) is a chronic disease that often results in multiple complications. Risk prediction and profiling of T2DM complications is critical for healthcare professionals to design personalized treatment plans for patients in diabetes care for improved outcomes. In this paper, we study the risk of developing complications after the initial T2DM diagnosis from longitudinal patient records. We propose a novel multi-task learning approach to simultaneously model multiple complications where each task corresponds to the risk modeling of one complication. Specifically, the proposed method strategically captures the relationships (1) between the risks of multiple T2DM complications, (2) between the different risk factors, and (3) between the risk factor selection patterns. The method uses coefficient shrinkage to identify an informative subset of risk factors from high-dimensional data, and uses a hierarchical Bayesian framework to allow domain knowledge to be incorporated as priors. The proposed method is favorable for healthcare applications because in additional to improved prediction performance, relationships among the different risks and risk factors are also identified. Extensive experimental results on a large electronic medical claims database show that the proposed method outperforms state-of-the-art models by a significant margin. Furthermore, we show that the risk associations learned and the risk factors identified lead to meaningful clinical insights. △ Less

Submitted 18 February, 2018; originally announced February 2018.

Journal ref: IEEE Transactions on Knowledge and Data Engineering, 2019

arXiv:1710.01581 [pdf, other]

doi 10.1109/JIOT.2017.2748987

Sensor Fusion for Public Space Utilization Monitoring in a Smart City

Authors: Billy Pik Lik Lau, Nipun Wijerathne, Benny Kai Kiat Ng, and Chau Yuen

Abstract: Public space utilization is crucial for urban developers to understand how efficient a place is being occupied in order to improve existing or future infrastructures. In a smart cities approach, implementing public space monitoring with Internet-of-Things (IoT) sensors appear to be a viable solution. However, choice of sensors often is a challenging problem and often linked with scalability, cover… ▽ More Public space utilization is crucial for urban developers to understand how efficient a place is being occupied in order to improve existing or future infrastructures. In a smart cities approach, implementing public space monitoring with Internet-of-Things (IoT) sensors appear to be a viable solution. However, choice of sensors often is a challenging problem and often linked with scalability, coverage, energy consumption, accuracy, and privacy. To get the most from low cost sensor with aforementioned design in mind, we proposed data processing modules for capturing public space utilization with Renewable Wireless Sensor Network (RWSN) platform using pyroelectric infrared (PIR) and analog sound sensor. We first proposed a calibration process to remove false alarm of PIR sensor due to the impact of weather and environment. We then demonstrate how the sounds sensor can be processed to provide various insight of a public space. Lastly, we fused both sensors and study a particular public space utilization based on one month data to unveil its usage. △ Less

Submitted 5 October, 2017; v1 submitted 14 September, 2017; originally announced October 2017.

arXiv:1708.00601 [pdf, other]

Exact Tensor Completion from Sparsely Corrupted Observations via Convex Optimization

Authors: Jonathan Q. Jiang, Michael K. Ng

Abstract: This paper conducts a rigorous analysis for provable estimation of multidimensional arrays, in particular third-order tensors, from a random subset of its corrupted entries. Our study rests heavily on a recently proposed tensor algebraic framework in which we can obtain tensor singular value decomposition (t-SVD) that is similar to the SVD for matrices, and define a new notion of tensor rank refer… ▽ More This paper conducts a rigorous analysis for provable estimation of multidimensional arrays, in particular third-order tensors, from a random subset of its corrupted entries. Our study rests heavily on a recently proposed tensor algebraic framework in which we can obtain tensor singular value decomposition (t-SVD) that is similar to the SVD for matrices, and define a new notion of tensor rank referred to as the tubal rank. We prove that by simply solving a convex program, which minimizes a weighted combination of tubal nuclear norm, a convex surrogate for the tubal rank, and the $\ell_1$-norm, one can recover an incoherent tensor exactly with overwhelming probability, provided that its tubal rank is not too large and that the corruptions are reasonably sparse. Interestingly, our result includes the recovery guarantees for the problems of tensor completion (TC) and tensor principal component analysis (TRPCA) under the same algebraic setup as special cases. An alternating direction method of multipliers (ADMM) algorithm is presented to solve this optimization problem. Numerical experiments verify our theory and real-world applications demonstrate the effectiveness of our algorithm. △ Less

Submitted 2 August, 2017; originally announced August 2017.

Comments: 36 pages, 9 figures

Showing 1–50 of 58 results for author: Ng, K