-
Feature Preserving Shrinkage on Bayesian Neural Networks via the R2D2 Prior
Authors:
Tsai Hor Chan,
Dora Yan Zhang,
Guosheng Yin,
Lequan Yu
Abstract:
Bayesian neural networks (BNNs) treat neural network weights as random variables, which aim to provide posterior uncertainty estimates and avoid overfitting by performing inference on the posterior weights. However, the selection of appropriate prior distributions remains a challenging task, and BNNs may suffer from catastrophic inflated variance or poor predictive performance when poor choices ar…
▽ More
Bayesian neural networks (BNNs) treat neural network weights as random variables, which aim to provide posterior uncertainty estimates and avoid overfitting by performing inference on the posterior weights. However, the selection of appropriate prior distributions remains a challenging task, and BNNs may suffer from catastrophic inflated variance or poor predictive performance when poor choices are made for the priors. Existing BNN designs apply different priors to weights, while the behaviours of these priors make it difficult to sufficiently shrink noisy signals or they are prone to overshrinking important signals in the weights. To alleviate this problem, we propose a novel R2D2-Net, which imposes the R^2-induced Dirichlet Decomposition (R2D2) prior to the BNN weights. The R2D2-Net can effectively shrink irrelevant coefficients towards zero, while preventing key features from over-shrinkage. To approximate the posterior distribution of weights more accurately, we further propose a variational Gibbs inference algorithm that combines the Gibbs updating procedure and gradient-based optimization. This strategy enhances stability and consistency in estimation when the variational objective involving the shrinkage parameters is non-convex. We also analyze the evidence lower bound (ELBO) and the posterior concentration rates from a theoretical perspective. Experiments on both natural and medical image classification and uncertainty estimation tasks demonstrate satisfactory performance of our method.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Real-Time Localization and Bimodal Point Pattern Analysis of Palms Using UAV Imagery
Authors:
Kangning Cui,
Wei Tang,
Rongkun Zhu,
Manqi Wang,
Gregory D. Larsen,
Victor P. Pauca,
Sarra Alqahtani,
Fan Yang,
David Segurado,
Paul Fine,
Jordan Karubian,
Raymond H. Chan,
Robert J. Plemmons,
Jean-Michel Morel,
Miles R. Silman
Abstract:
Understanding the spatial distribution of palms within tropical forests is essential for effective ecological monitoring, conservation strategies, and the sustainable integration of natural forest products into local and global supply chains. However, the analysis of remotely sensed data in these environments faces significant challenges, such as overlapping palm and tree crowns, uneven shading ac…
▽ More
Understanding the spatial distribution of palms within tropical forests is essential for effective ecological monitoring, conservation strategies, and the sustainable integration of natural forest products into local and global supply chains. However, the analysis of remotely sensed data in these environments faces significant challenges, such as overlapping palm and tree crowns, uneven shading across the canopy surface, and the heterogeneous nature of the forest landscapes, which often affect the performance of palm detection and segmentation algorithms. To overcome these issues, we introduce PalmDSNet, a deep learning framework for real-time detection, segmentation, and counting of canopy palms. Additionally, we employ a bimodal reproduction algorithm that simulates palm spatial propagation to further enhance the understanding of these point patterns using PalmDSNet's results. We used UAV-captured imagery to create orthomosaics from 21 sites across western Ecuadorian tropical forests, covering a gradient from the everwet Chocó forests near Colombia to the drier forests of southwestern Ecuador. Expert annotations were used to create a comprehensive dataset, including 7,356 bounding boxes on image patches and 7,603 palm centers across five orthomosaics, encompassing a total area of 449 hectares. By combining PalmDSNet with the bimodal reproduction algorithm, which optimizes parameters for both local and global spatial variability, we effectively simulate the spatial distribution of palms in diverse and dense tropical environments, validating its utility for advanced applications in tropical forest monitoring and remote sensing analysis.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
Dynamic Bayesian Networks with Conditional Dynamics in Edge Addition and Deletion
Authors:
Lupe S. H. Chan,
Amanda M. Y. Chu,
Mike K. P. So
Abstract:
This study presents a dynamic Bayesian network framework that facilitates intuitive gradual edge changes. We use two conditional dynamics to model the edge addition and deletion, and edge selection separately. Unlike previous research that uses a mixture network approach, which restricts the number of possible edge changes, or structural priors to induce gradual changes, which can lead to unclear…
▽ More
This study presents a dynamic Bayesian network framework that facilitates intuitive gradual edge changes. We use two conditional dynamics to model the edge addition and deletion, and edge selection separately. Unlike previous research that uses a mixture network approach, which restricts the number of possible edge changes, or structural priors to induce gradual changes, which can lead to unclear network evolution, our model induces more frequent and intuitive edge change dynamics. We employ Markov chain Monte Carlo (MCMC) sampling to estimate the model structures and parameters and demonstrate the model's effectiveness in a portfolio selection application.
△ Less
Submitted 7 May, 2025; v1 submitted 13 September, 2024;
originally announced September 2024.
-
Graphical copula GARCH modeling with dynamic conditional dependence
Authors:
Lupe Shun Hin Chan,
Amanda Man Ying Chu,
Mike Ka Pui So
Abstract:
Modeling returns on large portfolios is a challenging problem as the number of parameters in the covariance matrix grows as the square of the size of the portfolio. Traditional correlation models, for example, the dynamic conditional correlation (DCC)-GARCH model, often ignore the nonlinear dependencies in the tail of the return distribution. In this paper, we aim to develop a framework to model t…
▽ More
Modeling returns on large portfolios is a challenging problem as the number of parameters in the covariance matrix grows as the square of the size of the portfolio. Traditional correlation models, for example, the dynamic conditional correlation (DCC)-GARCH model, often ignore the nonlinear dependencies in the tail of the return distribution. In this paper, we aim to develop a framework to model the nonlinear dependencies dynamically, namely the graphical copula GARCH (GC-GARCH) model. Motivated from the capital asset pricing model, to allow modeling of large portfolios, the number of parameters can be greatly reduced by introducing conditional independence among stocks given some risk factors. The joint distribution of the risk factors is factorized using a directed acyclic graph (DAG) with pair-copula construction (PCC) to enhance the modeling of the tails of the return distribution while offering the flexibility of having complex dependent structures. The DAG induces topological orders to the risk factors, which can be regarded as a list of directions of the flow of information. The conditional distributions among stock returns are also modeled using PCC. Dynamic conditional dependence structures are incorporated to allow the parameters in the copulas to be time-varying. Three-stage estimation is used to estimate parameters in the marginal distributions, the risk factor copulas, and the stock copulas. The simulation study shows that the proposed estimation procedure can estimate the parameters and the underlying DAG structure accurately. In the investment experiment of the empirical study, we demonstrate that the GC-GARCH model produces more precise conditional value-at-risk prediction and considerably higher cumulative portfolio returns than the DCC-GARCH model.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
PCH-EM: A solution to information loss in the photon transfer method
Authors:
Aaron J. Hendrickson,
David P. Haefner,
Stanley H. Chan,
Nicholas R. Shade,
Eric R. Fossum
Abstract:
Working from a Poisson-Gaussian noise model, a multi-sample extension of the Photon Counting Histogram Expectation Maximization (PCH-EM) algorithm is derived as a general-purpose alternative to the Photon Transfer (PT) method. This algorithm is derived from the same model, requires the same experimental data, and estimates the same sensor performance parameters as the time-tested PT method, all wh…
▽ More
Working from a Poisson-Gaussian noise model, a multi-sample extension of the Photon Counting Histogram Expectation Maximization (PCH-EM) algorithm is derived as a general-purpose alternative to the Photon Transfer (PT) method. This algorithm is derived from the same model, requires the same experimental data, and estimates the same sensor performance parameters as the time-tested PT method, all while obtaining lower uncertainty estimates. It is shown that as read noise becomes large, multiple data samples are necessary to capture enough information about the parameters of a device under test, justifying the need for a multi-sample extension. An estimation procedure is devised consisting of initial PT characterization followed by repeated iteration of PCH-EM to demonstrate the improvement in estimate uncertainty achievable with PCH-EM; particularly in the regime of Deep Sub-Electron Read Noise (DSERN). A statistical argument based on the information theoretic concept of sufficiency is formulated to explain how PT data reduction procedures discard information contained in raw sensor data, thus explaining why the proposed algorithm is able to obtain lower uncertainty estimates of key sensor performance parameters such as read noise and conversion gain. Experimental data captured from a CMOS quanta image sensor with DSERN is then used to demonstrate the algorithm's usage and validate the underlying theory and statistical model. In support of the reproducible research effort, the code associated with this work can be obtained on the MathWorks File Exchange (Hendrickson et al., 2024).
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Effects of multi-dimensionality and energy exchange on electrostatic current-driven plasma instabilities and turbulence
Authors:
Wai Hong Ronald Chan,
Kentaro Hara,
Iain D. Boyd
Abstract:
Large-amplitude current-driven plasma instabilities, which can transition to the Buneman instability, were observed in one-dimensional (1D) simulations to generate high-energy backstreaming ions. We investigate the saturation of multi-dimensional plasma instabilities and its effects on energetic ion formation. Such ions directly impact spacecraft thruster lifetimes and are associated with magnetic…
▽ More
Large-amplitude current-driven plasma instabilities, which can transition to the Buneman instability, were observed in one-dimensional (1D) simulations to generate high-energy backstreaming ions. We investigate the saturation of multi-dimensional plasma instabilities and its effects on energetic ion formation. Such ions directly impact spacecraft thruster lifetimes and are associated with magnetic reconnection and cosmic ray inception. An Eulerian Vlasov--Poisson solver employing the grid-based direct kinetic method is used to study the growth and saturation of 2D2V collisionless, electrostatic current-driven instabilities spanning two dimensions each in the configuration (D) and velocity (V) spaces supporting ion and electron phase-space transport. Four stages characterise the electric potential evolution in such instabilities: linear modal growth, harmonic growth, accelerated growth via quasi-linear mechanisms alongside non-linear fill-in, and saturated turbulence. Its transition and isotropisation process bears considerable similarities to the development of hydrodynamic turbulence. While a tendency to isotropy is observed in the plasma waves, followed by electron and then ion phase space after several ion-acoustic periods, the formation of energetic backstreaming ions is more limited in the 2D2V than in the 1D1V simulations. Plasma waves formed by two-dimensional electrostatic kinetic instabilities can propagate in the direction perpendicular to the net electron drift. Thus, large-amplitude multi-dimensional waves generate high-energy transverse-streaming ions and eventually limit energetic backward-streaming ions along the longitudinal direction. The multi-dimensional study sheds light on interactions between longitudinal and transverse electrostatic plasma instabilities, as well as fundamental characteristics of the inception and sustenance of unmagnetised plasma turbulence.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Superpixel-based and Spatially-regularized Diffusion Learning for Unsupervised Hyperspectral Image Clustering
Authors:
Kangning Cui,
Ruoning Li,
Sam L. Polk,
Yinyi Lin,
Hongsheng Zhang,
James M. Murphy,
Robert J. Plemmons,
Raymond H. Chan
Abstract:
Hyperspectral images (HSIs) provide exceptional spatial and spectral resolution of a scene, crucial for various remote sensing applications. However, the high dimensionality, presence of noise and outliers, and the need for precise labels of HSIs present significant challenges to HSIs analysis, motivating the development of performant HSI clustering algorithms. This paper introduces a novel unsupe…
▽ More
Hyperspectral images (HSIs) provide exceptional spatial and spectral resolution of a scene, crucial for various remote sensing applications. However, the high dimensionality, presence of noise and outliers, and the need for precise labels of HSIs present significant challenges to HSIs analysis, motivating the development of performant HSI clustering algorithms. This paper introduces a novel unsupervised HSI clustering algorithm, Superpixel-based and Spatially-regularized Diffusion Learning (S2DL), which addresses these challenges by incorporating rich spatial information encoded in HSIs into diffusion geometry-based clustering. S2DL employs the Entropy Rate Superpixel (ERS) segmentation technique to partition an image into superpixels, then constructs a spatially-regularized diffusion graph using the most representative high-density pixels. This approach reduces computational burden while preserving accuracy. Cluster modes, serving as exemplars for underlying cluster structure, are identified as the highest-density pixels farthest in diffusion distance from other highest-density pixels. These modes guide the labeling of the remaining representative pixels from ERS superpixels. Finally, majority voting is applied to the labels assigned within each superpixel to propagate labels to the rest of the image. This spatial-spectral approach simultaneously simplifies graph construction, reduces computational cost, and improves clustering performance. S2DL's performance is illustrated with extensive experiments on three publicly available, real-world HSIs: Indian Pines, Salinas, and Salinas A. Additionally, we apply S2DL to landscape-scale, unsupervised mangrove species mapping in the Mai Po Nature Reserve, Hong Kong, using a Gaofen-5 HSI. The success of S2DL in these diverse numerical experiments indicates its efficacy on a wide range of important unsupervised remote sensing analysis tasks.
△ Less
Submitted 24 December, 2023;
originally announced December 2023.
-
Thresholding the higher criticism test statistics for optimality in a heterogeneous setting
Authors:
Hock Peng Chan
Abstract:
Donoho and Kipnis (2022) showed that the the higher criticism (HC) test statistic has a non-Gaussian phase transition but remarked that it is probably not optimal, in the detection of sparse differences between two large frequency tables when the counts are low. The setting can be considered to be heterogeneous, with cells containing larger total counts more able to detect smaller differences. We…
▽ More
Donoho and Kipnis (2022) showed that the the higher criticism (HC) test statistic has a non-Gaussian phase transition but remarked that it is probably not optimal, in the detection of sparse differences between two large frequency tables when the counts are low. The setting can be considered to be heterogeneous, with cells containing larger total counts more able to detect smaller differences. We provide a general study here of sparse detection arising from such heterogeneous settings, and showed that optimality of the HC test statistic requires thresholding, for example in the case of frequency table comparison, to restrict to p-values of cells with total counts exceeding a threshold. The use of thresholding also leads to optimality of the HC test statistic when it is applied on the sparse Poisson means model of Arias-Castro and Wang (2015). The phase transitions we consider here are non-Gaussian, and involve an interplay between the rate functions of the response and sample size distributions. We also showed, both theoretically and in a numerical study, that applying thresholding to the Bonferroni test statistic results in better sparse mixture detection in heterogeneous settings.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Variational Information Pursuit with Large Language and Multimodal Models for Interpretable Predictions
Authors:
Kwan Ho Ryan Chan,
Aditya Chattopadhyay,
Benjamin David Haeffele,
Rene Vidal
Abstract:
Variational Information Pursuit (V-IP) is a framework for making interpretable predictions by design by sequentially selecting a short chain of task-relevant, user-defined and interpretable queries about the data that are most informative for the task. While this allows for built-in interpretability in predictive models, applying V-IP to any task requires data samples with dense concept-labeling b…
▽ More
Variational Information Pursuit (V-IP) is a framework for making interpretable predictions by design by sequentially selecting a short chain of task-relevant, user-defined and interpretable queries about the data that are most informative for the task. While this allows for built-in interpretability in predictive models, applying V-IP to any task requires data samples with dense concept-labeling by domain experts, limiting the application of V-IP to small-scale tasks where manual data annotation is feasible. In this work, we extend the V-IP framework with Foundational Models (FMs) to address this limitation. More specifically, we use a two-step process, by first leveraging Large Language Models (LLMs) to generate a sufficiently large candidate set of task-relevant interpretable concepts, then using Large Multimodal Models to annotate each data sample by semantic similarity with each concept in the generated concept set. While other interpretable-by-design frameworks such as Concept Bottleneck Models (CBMs) require an additional step of removing repetitive and non-discriminative concepts to have good interpretability and test performance, we mathematically and empirically justify that, with a sufficiently informative and task-relevant query (concept) set, the proposed FM+V-IP method does not require any type of concept filtering. In addition, we show that FM+V-IP with LLM generated concepts can achieve better test performance than V-IP with human annotated concepts, demonstrating the effectiveness of LLMs at generating efficient query sets. Finally, when compared to other interpretable-by-design frameworks such as CBMs, FM+V-IP can achieve competitive test performance using fewer number of concepts/queries in both cases with filtered or unfiltered concept sets.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
Variational Information Pursuit for Interpretable Predictions
Authors:
Aditya Chattopadhyay,
Kwan Ho Ryan Chan,
Benjamin D. Haeffele,
Donald Geman,
René Vidal
Abstract:
There is a growing interest in the machine learning community in developing predictive algorithms that are "interpretable by design". Towards this end, recent work proposes to make interpretable decisions by sequentially asking interpretable queries about data until a prediction can be made with high confidence based on the answers obtained (the history). To promote short query-answer chains, a gr…
▽ More
There is a growing interest in the machine learning community in developing predictive algorithms that are "interpretable by design". Towards this end, recent work proposes to make interpretable decisions by sequentially asking interpretable queries about data until a prediction can be made with high confidence based on the answers obtained (the history). To promote short query-answer chains, a greedy procedure called Information Pursuit (IP) is used, which adaptively chooses queries in order of information gain. Generative models are employed to learn the distribution of query-answers and labels, which is in turn used to estimate the most informative query. However, learning and inference with a full generative model of the data is often intractable for complex tasks. In this work, we propose Variational Information Pursuit (V-IP), a variational characterization of IP which bypasses the need for learning generative models. V-IP is based on finding a query selection strategy and a classifier that minimizes the expected cross-entropy between true and predicted labels. We then demonstrate that the IP strategy is the optimal solution to this problem. Therefore, instead of learning generative models, we can use our optimal strategy to directly pick the most informative query given any history. We then develop a practical algorithm by defining a finite-dimensional parameterization of our strategy and classifier using deep networks and train them end-to-end using our objective. Empirically, V-IP is 10-100x faster than IP on different Vision and NLP tasks with competitive performance. Moreover, V-IP finds much shorter query chains when compared to reinforcement learning which is typically used in sequential-decision-making problems. Finally, we demonstrate the utility of V-IP on challenging tasks like medical diagnosis where the performance is far superior to the generative modelling approach.
△ Less
Submitted 15 February, 2023; v1 submitted 6 February, 2023;
originally announced February 2023.
-
Spectral analysis of multidimensional current-driven plasma instabilities and turbulence in hollow cathode plumes
Authors:
Wai Hong Ronald Chan,
Ken Hara,
Jonathan M. Wang,
Suhas S. Jain,
Shahab Mirjalili,
Iain D. Boyd
Abstract:
Large-amplitude current-driven instabilities in hollow cathode plumes can generate energetic ions responsible for cathode sputtering and spacecraft degradation. A 2D2V (two dimensions each in configuration [D] and velocity [V] spaces) grid-based Vlasov--Poisson (direct kinetic) solver is used to study their growth and saturation, which comprises four stages: linear growth, quasilinear resonance, n…
▽ More
Large-amplitude current-driven instabilities in hollow cathode plumes can generate energetic ions responsible for cathode sputtering and spacecraft degradation. A 2D2V (two dimensions each in configuration [D] and velocity [V] spaces) grid-based Vlasov--Poisson (direct kinetic) solver is used to study their growth and saturation, which comprises four stages: linear growth, quasilinear resonance, nonlinear fill-in, and saturated turbulence. The linear modal growth rate, nonlinear saturation process, and ion velocity and energy distribution features in the turbulent regime are analyzed. Backstreaming ions are generated for large electron drifts, several ion acoustic periods after the potential field becomes turbulent. Interscale phase-space transfer and locality are analyzed for the Vlasov equation. The multidimensional study sheds light on the interactions between longitudinal and transverse plasma instabilities, as well as the inception of plasma turbulence.
△ Less
Submitted 20 November, 2022; v1 submitted 20 October, 2022;
originally announced October 2022.
-
The Dynamics of Drop Breakup in Breaking Waves
Authors:
Wai Hong Ronald Chan
Abstract:
Breaking surface waves generate drops of a broad range of sizes that have a significant influence on regional and global climates, as well as the identification of ship movements. Characterizing these phenomena requires a fundamental understanding of the underlying mechanisms behind drop production. The interscale nature of these mechanisms also influences the development of models that enable cos…
▽ More
Breaking surface waves generate drops of a broad range of sizes that have a significant influence on regional and global climates, as well as the identification of ship movements. Characterizing these phenomena requires a fundamental understanding of the underlying mechanisms behind drop production. The interscale nature of these mechanisms also influences the development of models that enable cost-effective computation of large-scale waves. Interscale locality implies the universality of small scales and the suitability of generic subgrid-scale (SGS) models, while interscale nonlocality points to the potential dependence of the small scales on larger-scale geometry configurations and the corresponding need for tailored SGS models instead. A recently developed analysis toolkit combining theoretical population balance models, multiphase numerical simulations, and structure-tracking algorithms is used to probe the nature of drop production and its corresponding interscale mass-transfer characteristics above the surface of breaking waves. The results from the application of this toolkit suggest that while drop breakup is a somewhat scale-nonlocal process, its interscale transfer signature suggests that it is likely capillary-dominated and thus sensitive not to the specific nature of large-scale wave breaking, but rather to the specific geometry of the parent drops.
△ Less
Submitted 10 October, 2022;
originally announced October 2022.
-
Semi-supervised Change Detection of Small Water Bodies Using RGB and Multispectral Images in Peruvian Rainforests
Authors:
Kangning Cui,
Seda Camalan,
Ruoning Li,
Victor P. Pauca,
Sarra Alqahtani,
Robert J. Plemmons,
Miles Silman,
Evan N. Dethier,
David Lutz,
Raymond H. Chan
Abstract:
Artisanal and Small-scale Gold Mining (ASGM) is an important source of income for many households, but it can have large social and environmental effects, especially in rainforests of developing countries. The Sentinel-2 satellites collect multispectral images that can be used for the purpose of detecting changes in water extent and quality which indicates the locations of mining sites. This work…
▽ More
Artisanal and Small-scale Gold Mining (ASGM) is an important source of income for many households, but it can have large social and environmental effects, especially in rainforests of developing countries. The Sentinel-2 satellites collect multispectral images that can be used for the purpose of detecting changes in water extent and quality which indicates the locations of mining sites. This work focuses on the recognition of ASGM activities in Peruvian Amazon rainforests. We tested several semi-supervised classifiers based on Support Vector Machines (SVMs) to detect the changes of water bodies from 2019 to 2021 in the Madre de Dios region, which is one of the global hotspots of ASGM activities. Experiments show that SVM-based models can achieve reasonable performance for both RGB (using Cohen's $κ$ 0.49) and 6-channel images (using Cohen's $κ$ 0.71) with very limited annotations. The efficacy of incorporating Lab color space for change detection is analyzed as well.
△ Less
Submitted 19 June, 2022;
originally announced June 2022.
-
Unsupervised Spatial-spectral Hyperspectral Image Reconstruction and Clustering with Diffusion Geometry
Authors:
Kangning Cui,
Ruoning Li,
Sam L. Polk,
James M. Murphy,
Robert J. Plemmons,
Raymond H. Chan
Abstract:
Hyperspectral images, which store a hundred or more spectral bands of reflectance, have become an important data source in natural and social sciences. Hyperspectral images are often generated in large quantities at a relatively coarse spatial resolution. As such, unsupervised machine learning algorithms incorporating known structure in hyperspectral imagery are needed to analyze these images auto…
▽ More
Hyperspectral images, which store a hundred or more spectral bands of reflectance, have become an important data source in natural and social sciences. Hyperspectral images are often generated in large quantities at a relatively coarse spatial resolution. As such, unsupervised machine learning algorithms incorporating known structure in hyperspectral imagery are needed to analyze these images automatically. This work introduces the Spatial-Spectral Image Reconstruction and Clustering with Diffusion Geometry (DSIRC) algorithm for partitioning highly mixed hyperspectral images. DSIRC reduces measurement noise through a shape-adaptive reconstruction procedure. In particular, for each pixel, DSIRC locates spectrally correlated pixels within a data-adaptive spatial neighborhood and reconstructs that pixel's spectral signature using those of its neighbors. DSIRC then locates high-density, high-purity pixels far in diffusion distance (a data-dependent distance metric) from other high-density, high-purity pixels and treats these as cluster exemplars, giving each a unique label. Non-modal pixels are assigned the label of their diffusion distance-nearest neighbor of higher density and purity that is already labeled. Strong numerical results indicate that incorporating spatial information through image reconstruction substantially improves the performance of pixel-wise clustering.
△ Less
Submitted 28 April, 2022;
originally announced April 2022.
-
A 3-stage Spectral-spatial Method for Hyperspectral Image Classification
Authors:
Raymond H. Chan,
Ruoning Li
Abstract:
Hyperspectral images often have hundreds of spectral bands of different wavelengths captured by aircraft or satellites that record land coverage. Identifying detailed classes of pixels becomes feasible due to the enhancement in spectral and spatial resolution of hyperspectral images. In this work, we propose a novel framework that utilizes both spatial and spectral information for classifying pixe…
▽ More
Hyperspectral images often have hundreds of spectral bands of different wavelengths captured by aircraft or satellites that record land coverage. Identifying detailed classes of pixels becomes feasible due to the enhancement in spectral and spatial resolution of hyperspectral images. In this work, we propose a novel framework that utilizes both spatial and spectral information for classifying pixels in hyperspectral images. The method consists of three stages. In the first stage, the pre-processing stage, Nested Sliding Window algorithm is used to reconstruct the original data by {enhancing the consistency of neighboring pixels} and then Principal Component Analysis is used to reduce the dimension of data. In the second stage, Support Vector Machines are trained to estimate the pixel-wise probability map of each class using the spectral information from the images. Finally, a smoothed total variation model is applied to smooth the class probability vectors by {ensuring spatial connectivity} in the images. We demonstrate the superiority of our method against three state-of-the-art algorithms on six benchmark hyperspectral data sets with 10 to 50 training labels for each class. The results show that our method gives the overall best performance in accuracy. Especially, our gain in accuracy increases when the number of labeled pixels decreases and therefore our method is more advantageous to be applied to problems with small training set. Hence it is of great practical significance since expert annotations are often expensive and difficult to collect.
△ Less
Submitted 20 April, 2022;
originally announced April 2022.
-
Unsupervised detection of ash dieback disease (Hymenoscyphus fraxineus) using diffusion-based hyperspectral image clustering
Authors:
Sam L. Polk,
Aland H. Y. Chan,
Kangning Cui,
Robert J. Plemmons,
David A. Coomes,
James M. Murphy
Abstract:
Ash dieback (Hymenoscyphus fraxineus) is an introduced fungal disease that is causing the widespread death of ash trees across Europe. Remote sensing hyperspectral images encode rich structure that has been exploited for the detection of dieback disease in ash trees using supervised machine learning techniques. However, to understand the state of forest health at landscape-scale, accurate unsuperv…
▽ More
Ash dieback (Hymenoscyphus fraxineus) is an introduced fungal disease that is causing the widespread death of ash trees across Europe. Remote sensing hyperspectral images encode rich structure that has been exploited for the detection of dieback disease in ash trees using supervised machine learning techniques. However, to understand the state of forest health at landscape-scale, accurate unsupervised approaches are needed. This article investigates the use of the unsupervised Diffusion and VCA-Assisted Image Segmentation (D-VIS) clustering algorithm for the detection of ash dieback disease in a forest site near Cambridge, United Kingdom. The unsupervised clustering presented in this work has high overlap with the supervised classification of previous work on this scene (overall accuracy = 71%). Thus, unsupervised learning may be used for the remote detection of ash dieback disease without the need for expert labeling.
△ Less
Submitted 19 April, 2022;
originally announced April 2022.
-
Classification of Hyperspectral Images Using SVM with Shape-adaptive Reconstruction and Smoothed Total Variation
Authors:
Ruoning Li,
Kangning Cui,
Raymond H. Chan,
Robert J. Plemmons
Abstract:
In this work, a novel algorithm called SVM with Shape-adaptive Reconstruction and Smoothed Total Variation (SaR-SVM-STV) is introduced to classify hyperspectral images, which makes full use of spatial and spectral information. The Shape-adaptive Reconstruction (SaR) is introduced to preprocess each pixel based on the Pearson Correlation between pixels in its shape-adaptive (SA) region. Support Vec…
▽ More
In this work, a novel algorithm called SVM with Shape-adaptive Reconstruction and Smoothed Total Variation (SaR-SVM-STV) is introduced to classify hyperspectral images, which makes full use of spatial and spectral information. The Shape-adaptive Reconstruction (SaR) is introduced to preprocess each pixel based on the Pearson Correlation between pixels in its shape-adaptive (SA) region. Support Vector Machines (SVMs) are trained to estimate the pixel-wise probability maps of each class. Then the Smoothed Total Variation (STV) model is applied to denoise and generate the final classification map. Experiments show that SaR-SVM-STV outperforms the SVM-STV method with a few training labels, demonstrating the significance of reconstructing hyperspectral images before classification.
△ Less
Submitted 14 April, 2022; v1 submitted 29 March, 2022;
originally announced March 2022.
-
Unsupervised Diffusion and Volume Maximization-Based Clustering of Hyperspectral Images
Authors:
Sam L. Polk,
Kangning Cui,
Aland H. Y. Chan,
David A. Coomes,
Robert J. Plemmons,
James M. Murphy
Abstract:
Hyperspectral images taken from aircraft or satellites contain information from hundreds of spectral bands, within which lie latent lower-dimensional structures that can be exploited for classifying vegetation and other materials. A disadvantage of working with hyperspectral images is that, due to an inherent trade-off between spectral and spatial resolution, they have a relatively coarse spatial…
▽ More
Hyperspectral images taken from aircraft or satellites contain information from hundreds of spectral bands, within which lie latent lower-dimensional structures that can be exploited for classifying vegetation and other materials. A disadvantage of working with hyperspectral images is that, due to an inherent trade-off between spectral and spatial resolution, they have a relatively coarse spatial scale, meaning that single pixels may correspond to spatial regions containing multiple materials. This article introduces the Diffusion and Volume maximization-based Image Clustering (D-VIC) algorithm for unsupervised material clustering to address this problem. By directly incorporating pixel purity into its labeling procedure, D-VIC gives greater weight to pixels that correspond to a spatial region containing just a single material. D-VIC is shown to outperform comparable state-of-the-art methods in extensive experiments on a range of hyperspectral images, including land-use maps and highly mixed forest health surveys (in the context of ash dieback disease), implying that it is well-equipped for unsupervised material clustering of spectrally-mixed hyperspectral datasets.
△ Less
Submitted 19 February, 2023; v1 submitted 18 March, 2022;
originally announced March 2022.
-
Nearly Unstable Integer-Valued ARCH Process and Unit Root Testing
Authors:
Wagner Barreto-Souza,
Ngai Hang Chan
Abstract:
This paper introduces a Nearly Unstable INteger-valued AutoRegressive Conditional Heteroskedasticity (NU-INARCH) process for dealing with count time series data. It is proved that a proper normalization of the NU-INARCH process endowed with a Skorohod topology weakly converges to a Cox-Ingersoll-Ross diffusion. The asymptotic distribution of the conditional least squares estimator of the correlati…
▽ More
This paper introduces a Nearly Unstable INteger-valued AutoRegressive Conditional Heteroskedasticity (NU-INARCH) process for dealing with count time series data. It is proved that a proper normalization of the NU-INARCH process endowed with a Skorohod topology weakly converges to a Cox-Ingersoll-Ross diffusion. The asymptotic distribution of the conditional least squares estimator of the correlation parameter is established as a functional of certain stochastic integrals. Numerical experiments based on Monte Carlo simulations are provided to verify the behavior of the asymptotic distribution under finite samples. These simulations reveal that the nearly unstable approach provides satisfactory and better results than those based on the stationarity assumption even when the true process is not that close to non-stationarity. A unit root test is proposed and its Type-I error and power are examined via Monte Carlo simulations. As an illustration, the proposed methodology is applied to the daily number of deaths due to COVID-19 in the United Kingdom.
△ Less
Submitted 16 July, 2021;
originally announced July 2021.
-
ReduNet: A White-box Deep Network from the Principle of Maximizing Rate Reduction
Authors:
Kwan Ho Ryan Chan,
Yaodong Yu,
Chong You,
Haozhi Qi,
John Wright,
Yi Ma
Abstract:
This work attempts to provide a plausible theoretical framework that aims to interpret modern deep (convolutional) networks from the principles of data compression and discriminative representation. We argue that for high-dimensional multi-class data, the optimal linear discriminative representation maximizes the coding rate difference between the whole dataset and the average of all the subsets.…
▽ More
This work attempts to provide a plausible theoretical framework that aims to interpret modern deep (convolutional) networks from the principles of data compression and discriminative representation. We argue that for high-dimensional multi-class data, the optimal linear discriminative representation maximizes the coding rate difference between the whole dataset and the average of all the subsets. We show that the basic iterative gradient ascent scheme for optimizing the rate reduction objective naturally leads to a multi-layer deep network, named ReduNet, which shares common characteristics of modern deep networks. The deep layered architectures, linear and nonlinear operators, and even parameters of the network are all explicitly constructed layer-by-layer via forward propagation, although they are amenable to fine-tuning via back propagation. All components of so-obtained "white-box" network have precise optimization, statistical, and geometric interpretation. Moreover, all linear operators of the so-derived network naturally become multi-channel convolutions when we enforce classification to be rigorously shift-invariant. The derivation in the invariant setting suggests a trade-off between sparsity and invariance, and also indicates that such a deep convolution network is significantly more efficient to construct and learn in the spectral domain. Our preliminary simulations and experiments clearly verify the effectiveness of both the rate reduction objective and the associated ReduNet. All code and data are available at \url{https://github.com/Ma-Lab-Berkeley}.
△ Less
Submitted 28 November, 2021; v1 submitted 21 May, 2021;
originally announced May 2021.
-
Student-Teacher Learning from Clean Inputs to Noisy Inputs
Authors:
Guanzhe Hong,
Zhiyuan Mao,
Xiaojun Lin,
Stanley H. Chan
Abstract:
Feature-based student-teacher learning, a training method that encourages the student's hidden features to mimic those of the teacher network, is empirically successful in transferring the knowledge from a pre-trained teacher network to the student network. Furthermore, recent empirical results demonstrate that, the teacher's features can boost the student network's generalization even when the st…
▽ More
Feature-based student-teacher learning, a training method that encourages the student's hidden features to mimic those of the teacher network, is empirically successful in transferring the knowledge from a pre-trained teacher network to the student network. Furthermore, recent empirical results demonstrate that, the teacher's features can boost the student network's generalization even when the student's input sample is corrupted by noise. However, there is a lack of theoretical insights into why and when this method of transferring knowledge can be successful between such heterogeneous tasks. We analyze this method theoretically using deep linear networks, and experimentally using nonlinear networks. We identify three vital factors to the success of the method: (1) whether the student is trained to zero training loss; (2) how knowledgeable the teacher is on the clean-input problem; (3) how the teacher decomposes its knowledge in its hidden features. Lack of proper control in any of the three factors leads to failure of the student-teacher learning method.
△ Less
Submitted 12 March, 2021;
originally announced March 2021.
-
Deep Networks from the Principle of Rate Reduction
Authors:
Kwan Ho Ryan Chan,
Yaodong Yu,
Chong You,
Haozhi Qi,
John Wright,
Yi Ma
Abstract:
This work attempts to interpret modern deep (convolutional) networks from the principles of rate reduction and (shift) invariant classification. We show that the basic iterative gradient ascent scheme for optimizing the rate reduction of learned features naturally leads to a multi-layer deep network, one iteration per layer. The layered architectures, linear and nonlinear operators, and even param…
▽ More
This work attempts to interpret modern deep (convolutional) networks from the principles of rate reduction and (shift) invariant classification. We show that the basic iterative gradient ascent scheme for optimizing the rate reduction of learned features naturally leads to a multi-layer deep network, one iteration per layer. The layered architectures, linear and nonlinear operators, and even parameters of the network are all explicitly constructed layer-by-layer in a forward propagation fashion by emulating the gradient scheme. All components of this "white box" network have precise optimization, statistical, and geometric interpretation. This principled framework also reveals and justifies the role of multi-channel lifting and sparse coding in early stage of deep networks. Moreover, all linear operators of the so-derived network naturally become multi-channel convolutions when we enforce classification to be rigorously shift-invariant. The derivation also indicates that such a convolutional network is significantly more efficient to construct and learn in the spectral domain. Our preliminary simulations and experiments indicate that so constructed deep network can already learn a good discriminative representation even without any back propagation training.
△ Less
Submitted 27 October, 2020;
originally announced October 2020.
-
Multichannel Generative Language Model: Learning All Possible Factorizations Within and Across Channels
Authors:
Harris Chan,
Jamie Kiros,
William Chan
Abstract:
A channel corresponds to a viewpoint or transformation of an underlying meaning. A pair of parallel sentences in English and French express the same underlying meaning, but through two separate channels corresponding to their languages. In this work, we present the Multichannel Generative Language Model (MGLM). MGLM is a generative joint distribution model over channels. MGLM marginalizes over all…
▽ More
A channel corresponds to a viewpoint or transformation of an underlying meaning. A pair of parallel sentences in English and French express the same underlying meaning, but through two separate channels corresponding to their languages. In this work, we present the Multichannel Generative Language Model (MGLM). MGLM is a generative joint distribution model over channels. MGLM marginalizes over all possible factorizations within and across all channels. MGLM endows flexible inference, including unconditional generation, conditional generation (where 1 channel is observed and other channels are generated), and partially observed generation (where incomplete observations are spread across all the channels). We experiment with the Multi30K dataset containing English, French, Czech, and German. We demonstrate experiments with unconditional, conditional, and partially conditional generation. We provide qualitative samples sampled unconditionally from the generative joint distribution. We also quantitatively analyze the quality-diversity trade-offs and find MGLM outperforms traditional bilingual discriminative models.
△ Less
Submitted 9 October, 2020;
originally announced October 2020.
-
Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning
Authors:
Silviu Pitis,
Harris Chan,
Stephen Zhao,
Bradly Stadie,
Jimmy Ba
Abstract:
What goals should a multi-goal reinforcement learning agent pursue during training in long-horizon tasks? When the desired (test time) goal distribution is too distant to offer a useful learning signal, we argue that the agent should not pursue unobtainable goals. Instead, it should set its own intrinsic goals that maximize the entropy of the historical achieved goal distribution. We propose to op…
▽ More
What goals should a multi-goal reinforcement learning agent pursue during training in long-horizon tasks? When the desired (test time) goal distribution is too distant to offer a useful learning signal, we argue that the agent should not pursue unobtainable goals. Instead, it should set its own intrinsic goals that maximize the entropy of the historical achieved goal distribution. We propose to optimize this objective by having the agent pursue past achieved goals in sparsely explored areas of the goal space, which focuses exploration on the frontier of the achievable goal set. We show that our strategy achieves an order of magnitude better sample efficiency than the prior state of the art on long-horizon multi-goal tasks including maze navigation and block stacking.
△ Less
Submitted 6 July, 2020;
originally announced July 2020.
-
Quasi-conformal Geometry based Local Deformation Analysis of Lateral Cephalogram for Childhood OSA Classification
Authors:
Hei-Long Chan,
Hoi-Man Yuen,
Chun-Ting Au,
Kate Ching-Ching Chan,
Albert Martin Li,
Lok-Ming Lui
Abstract:
Craniofacial profile is one of the anatomical causes of obstructive sleep apnea(OSA). By medical research, cephalometry provides information on patients' skeletal structures and soft tissues. In this work, a novel approach to cephalometric analysis using quasi-conformal geometry based local deformation information was proposed for OSA classification. Our study was a retrospective analysis based on…
▽ More
Craniofacial profile is one of the anatomical causes of obstructive sleep apnea(OSA). By medical research, cephalometry provides information on patients' skeletal structures and soft tissues. In this work, a novel approach to cephalometric analysis using quasi-conformal geometry based local deformation information was proposed for OSA classification. Our study was a retrospective analysis based on 60 case-control pairs with accessible lateral cephalometry and polysomnography (PSG) data. By using the quasi-conformal geometry to study the local deformation around 15 landmark points, and combining the results with three linear distances between landmark points, a total of 1218 information features were obtained per subject. A L2 norm based classification model was built. Under experiments, our proposed model achieves 92.5% testing accuracy.
△ Less
Submitted 31 May, 2020;
originally announced June 2020.
-
Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction
Authors:
Yaodong Yu,
Kwan Ho Ryan Chan,
Chong You,
Chaobing Song,
Yi Ma
Abstract:
To learn intrinsic low-dimensional structures from high-dimensional data that most discriminate between classes, we propose the principle of Maximal Coding Rate Reduction ($\text{MCR}^2$), an information-theoretic measure that maximizes the coding rate difference between the whole dataset and the sum of each individual class. We clarify its relationships with most existing frameworks such as cross…
▽ More
To learn intrinsic low-dimensional structures from high-dimensional data that most discriminate between classes, we propose the principle of Maximal Coding Rate Reduction ($\text{MCR}^2$), an information-theoretic measure that maximizes the coding rate difference between the whole dataset and the sum of each individual class. We clarify its relationships with most existing frameworks such as cross-entropy, information bottleneck, information gain, contractive and contrastive learning, and provide theoretical guarantees for learning diverse and discriminative features. The coding rate can be accurately computed from finite samples of degenerate subspace-like distributions and can learn intrinsic representations in supervised, self-supervised, and unsupervised settings in a unified manner. Empirically, the representations learned using this principle alone are significantly more robust to label corruptions in classification than those using cross-entropy, and can lead to state-of-the-art results in clustering mixed data from self-learned invariant features.
△ Less
Submitted 15 June, 2020;
originally announced June 2020.
-
One Size Fits All: Can We Train One Denoiser for All Noise Levels?
Authors:
Abhiram Gnansambandam,
Stanley H. Chan
Abstract:
When training an estimator such as a neural network for tasks like image denoising, it is often preferred to train one estimator and apply it to all noise levels. The de facto training protocol to achieve this goal is to train the estimator with noisy samples whose noise levels are uniformly distributed across the range of interest. However, why should we allocate the samples uniformly? Can we hav…
▽ More
When training an estimator such as a neural network for tasks like image denoising, it is often preferred to train one estimator and apply it to all noise levels. The de facto training protocol to achieve this goal is to train the estimator with noisy samples whose noise levels are uniformly distributed across the range of interest. However, why should we allocate the samples uniformly? Can we have more training samples that are less noisy, and fewer samples that are more noisy? What is the optimal distribution? How do we obtain such a distribution? The goal of this paper is to address this training sample distribution problem from a minimax risk optimization perspective. We derive a dual ascent algorithm to determine the optimal sampling distribution of which the convergence is guaranteed as long as the set of admissible estimators is closed and convex. For estimators with non-convex admissible sets such as deep neural networks, our dual formulation converges to a solution of the convex relaxation. We discuss how the algorithm can be implemented in practice. We evaluate the algorithm on linear estimators and deep networks.
△ Less
Submitted 16 July, 2020; v1 submitted 19 May, 2020;
originally announced May 2020.
-
IROS 2019 Lifelong Robotic Vision Challenge -- Lifelong Object Recognition Report
Authors:
Qi She,
Fan Feng,
Qi Liu,
Rosa H. M. Chan,
Xinyue Hao,
Chuanlin Lan,
Qihan Yang,
Vincenzo Lomonaco,
German I. Parisi,
Heechul Bae,
Eoin Brophy,
Baoquan Chen,
Gabriele Graffieti,
Vidit Goel,
Hyonyoung Han,
Sathursan Kanagarajah,
Somesh Kumar,
Siew-Kei Lam,
Tin Lun Lam,
Liang Ma,
Davide Maltoni,
Lorenzo Pellegrini,
Duvindu Piyasena,
Shiliang Pu,
Debdoot Sheet
, et al. (11 additional authors not shown)
Abstract:
This report summarizes IROS 2019-Lifelong Robotic Vision Competition (Lifelong Object Recognition Challenge) with methods and results from the top $8$ finalists (out of over~$150$ teams). The competition dataset (L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition (OpenLORIS-object) is designed for driving lifelong/continual learning research and application in robotic vision domain, w…
▽ More
This report summarizes IROS 2019-Lifelong Robotic Vision Competition (Lifelong Object Recognition Challenge) with methods and results from the top $8$ finalists (out of over~$150$ teams). The competition dataset (L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition (OpenLORIS-object) is designed for driving lifelong/continual learning research and application in robotic vision domain, with everyday objects in home, office, campus, and mall scenarios. The dataset explicitly quantifies the variants of illumination, object occlusion, object size, camera-object distance/angles, and clutter information. Rules are designed to quantify the learning capability of the robotic vision system when faced with the objects appearing in the dynamic environments in the contest. Individual reports, dataset information, rules, and released source code can be found at the project homepage: "https://lifelong-robotic-vision.github.io/competition/".
△ Less
Submitted 26 April, 2020;
originally announced April 2020.
-
QC-SPHRAM: Quasi-conformal Spherical Harmonics Based Geometric Distortions on Hippocampal Surfaces for Early Detection of the Alzheimer's Disease
Authors:
Anthony Hei-Long Chan,
Yishan Luo,
Lin Shi,
Ronald Lok-Ming Lui
Abstract:
We propose a disease classification model, called the QC-SPHARM, for the early detection of the Alzheimer's Disease (AD). The proposed QC-SPHARM can distinguish between normal control (NC) subjects and AD patients, as well as between amnestic mild cognitive impairment (aMCI) patients having high possibility progressing into AD and those who do not. Using the spherical harmonics (SPHARM) based regi…
▽ More
We propose a disease classification model, called the QC-SPHARM, for the early detection of the Alzheimer's Disease (AD). The proposed QC-SPHARM can distinguish between normal control (NC) subjects and AD patients, as well as between amnestic mild cognitive impairment (aMCI) patients having high possibility progressing into AD and those who do not. Using the spherical harmonics (SPHARM) based registration, hippocampal surfaces segmented from the ADNI data are individually registered to a template surface constructed from the NC subjects using SPHARM. Local geometric distortions of the deformation from the template surface to each subject are quantified in terms of conformality distortions and curvatures distortions. The measurements are combined with the spherical harmonics coefficients and the total volume change of the subject from the template. Afterwards, a t-test based feature selection method incorporating the bagging strategy is applied to extract those local regions having high discriminating power of the two classes. The disease diagnosis machine can therefore be built using the data under the Support Vector Machine (SVM) setting. Using 110 NC subjects and 110 AD patients from the ADNI database, the proposed algorithm achieves 85:2% testing accuracy on 80 random samples as testing subjects, with the incorporation of surface geometry in the classification machine. Using 20 aMCI patients who has advanced to AD during a two-year period and another 20 aMCI patients who remain non-AD for the next two years, the algorithm achieves 81:2% accuracy using 10 randomly picked subjects as testing data. Our proposed method is 6%-15% better than other classification models without the incorporation of surface geometry. The results demonstrate the advantages of using local geometric distortions as the discriminating criterion for early AD diagnosis.
△ Less
Submitted 19 March, 2020;
originally announced March 2020.
-
An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality
Authors:
Silviu Pitis,
Harris Chan,
Kiarash Jamali,
Jimmy Ba
Abstract:
Distances are pervasive in machine learning. They serve as similarity measures, loss functions, and learning targets; it is said that a good distance measure solves a task. When defining distances, the triangle inequality has proven to be a useful constraint, both theoretically--to prove convergence and optimality guarantees--and empirically--as an inductive bias. Deep metric learning architecture…
▽ More
Distances are pervasive in machine learning. They serve as similarity measures, loss functions, and learning targets; it is said that a good distance measure solves a task. When defining distances, the triangle inequality has proven to be a useful constraint, both theoretically--to prove convergence and optimality guarantees--and empirically--as an inductive bias. Deep metric learning architectures that respect the triangle inequality rely, almost exclusively, on Euclidean distance in the latent space. Though effective, this fails to model two broad classes of subadditive distances, common in graphs and reinforcement learning: asymmetric metrics, and metrics that cannot be embedded into Euclidean space. To address these problems, we introduce novel architectures that are guaranteed to satisfy the triangle inequality. We prove our architectures universally approximate norm-induced metrics on $\mathbb{R}^n$, and present a similar result for modified Input Convex Neural Networks. We show that our architectures outperform existing metric approaches when modeling graph distances and have a better inductive bias than non-metric approaches when training data is limited in the multi-goal reinforcement learning setting.
△ Less
Submitted 6 July, 2020; v1 submitted 13 February, 2020;
originally announced February 2020.
-
OpenLORIS-Object: A Robotic Vision Dataset and Benchmark for Lifelong Deep Learning
Authors:
Qi She,
Fan Feng,
Xinyue Hao,
Qihan Yang,
Chuanlin Lan,
Vincenzo Lomonaco,
Xuesong Shi,
Zhengwei Wang,
Yao Guo,
Yimin Zhang,
Fei Qiao,
Rosa H. M. Chan
Abstract:
The recent breakthroughs in computer vision have benefited from the availability of large representative datasets (e.g. ImageNet and COCO) for training. Yet, robotic vision poses unique challenges for applying visual algorithms developed from these standard computer vision datasets due to their implicit assumption over non-varying distributions for a fixed set of tasks. Fully retraining models eac…
▽ More
The recent breakthroughs in computer vision have benefited from the availability of large representative datasets (e.g. ImageNet and COCO) for training. Yet, robotic vision poses unique challenges for applying visual algorithms developed from these standard computer vision datasets due to their implicit assumption over non-varying distributions for a fixed set of tasks. Fully retraining models each time a new task becomes available is infeasible due to computational, storage and sometimes privacy issues, while naïve incremental strategies have been shown to suffer from catastrophic forgetting. It is crucial for the robots to operate continuously under open-set and detrimental conditions with adaptive visual perceptual systems, where lifelong learning is a fundamental capability. However, very few datasets and benchmarks are available to evaluate and compare emerging techniques. To fill this gap, we provide a new lifelong robotic vision dataset ("OpenLORIS-Object") collected via RGB-D cameras. The dataset embeds the challenges faced by a robot in the real-life application and provides new benchmarks for validating lifelong object recognition algorithms. Moreover, we have provided a testbed of $9$ state-of-the-art lifelong learning algorithms. Each of them involves $48$ tasks with $4$ evaluation metrics over the OpenLORIS-Object dataset. The results demonstrate that the object recognition task in the ever-changing difficulty environments is far from being solved and the bottlenecks are at the forward/backward transfer designs. Our dataset and benchmark are publicly available at at \href{https://lifelong-robotic-vision.github.io/dataset/object}{\underline{https://lifelong-robotic-vision.github.io/dataset/object}}.
△ Less
Submitted 6 March, 2020; v1 submitted 15 November, 2019;
originally announced November 2019.
-
An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise
Authors:
Yeming Wen,
Kevin Luk,
Maxime Gazeau,
Guodong Zhang,
Harris Chan,
Jimmy Ba
Abstract:
The choice of batch-size in a stochastic optimization algorithm plays a substantial role for both optimization and generalization. Increasing the batch-size used typically improves optimization but degrades generalization. To address the problem of improving generalization while maintaining optimal convergence in large-batch training, we propose to add covariance noise to the gradients. We demonst…
▽ More
The choice of batch-size in a stochastic optimization algorithm plays a substantial role for both optimization and generalization. Increasing the batch-size used typically improves optimization but degrades generalization. To address the problem of improving generalization while maintaining optimal convergence in large-batch training, we propose to add covariance noise to the gradients. We demonstrate that the learning performance of our method is more accurately captured by the structure of the covariance matrix of the noise rather than by the variance of gradients. Moreover, over the convex-quadratic, we prove in theory that it can be characterized by the Frobenius norm of the noise matrix. Our empirical studies with standard deep learning model-architectures and datasets shows that our method not only improves generalization performance in large-batch training, but furthermore, does so in a way where the optimization performance remains desirable and the training duration is not elongated.
△ Less
Submitted 28 February, 2020; v1 submitted 21 February, 2019;
originally announced February 2019.
-
ACTRCE: Augmenting Experience via Teacher's Advice For Multi-Goal Reinforcement Learning
Authors:
Harris Chan,
Yuhuai Wu,
Jamie Kiros,
Sanja Fidler,
Jimmy Ba
Abstract:
Sparse reward is one of the most challenging problems in reinforcement learning (RL). Hindsight Experience Replay (HER) attempts to address this issue by converting a failed experience to a successful one by relabeling the goals. Despite its effectiveness, HER has limited applicability because it lacks a compact and universal goal representation. We present Augmenting experienCe via TeacheR's advi…
▽ More
Sparse reward is one of the most challenging problems in reinforcement learning (RL). Hindsight Experience Replay (HER) attempts to address this issue by converting a failed experience to a successful one by relabeling the goals. Despite its effectiveness, HER has limited applicability because it lacks a compact and universal goal representation. We present Augmenting experienCe via TeacheR's adviCE (ACTRCE), an efficient reinforcement learning technique that extends the HER framework using natural language as the goal representation. We first analyze the differences among goal representation, and show that ACTRCE can efficiently solve difficult reinforcement learning problems in challenging 3D navigation tasks, whereas HER with non-language goal representation failed to learn. We also show that with language goal representations, the agent can generalize to unseen instructions, and even generalize to instructions with unseen lexicons. We further demonstrate it is crucial to use hindsight advice to solve challenging tasks, and even small amount of advice is sufficient for the agent to achieve good performance.
△ Less
Submitted 12 February, 2019;
originally announced February 2019.
-
Are You Sure You Want To Do That? Classification with Verification
Authors:
Harris Chan,
Atef Chaudhury,
Kevin Shen
Abstract:
Classification systems typically act in isolation, meaning they are required to implicitly memorize the characteristics of all candidate classes in order to classify. The cost of this is increased memory usage and poor sample efficiency. We propose a model which instead verifies using reference images during the classification process, reducing the burden of memorization. The model uses iterative…
▽ More
Classification systems typically act in isolation, meaning they are required to implicitly memorize the characteristics of all candidate classes in order to classify. The cost of this is increased memory usage and poor sample efficiency. We propose a model which instead verifies using reference images during the classification process, reducing the burden of memorization. The model uses iterative nondifferentiable queries in order to classify an image. We demonstrate that such a model is feasible to train and can match baseline accuracy while being more parameter efficient. However, we show that finding the correct balance between image recognition and verification is essential to pushing the model towards desired behavior, suggesting that a pipeline of recognition followed by verification is a more promising approach.
△ Less
Submitted 12 September, 2018; v1 submitted 7 September, 2018;
originally announced September 2018.
-
Geared Rotationally Identical and Invariant Convolutional Neural Network Systems
Authors:
ShihChung B. Lo,
Ph. D.,
Matthew T. Freedman,
M. D.,
Seong K. Mun,
Ph. D.,
Heang-Ping Chan,
Ph. D
Abstract:
Theorems and techniques to form different types of transformationally invariant processing and to produce the same output quantitatively based on either transformationally invariant operators or symmetric operations have recently been introduced by the authors. In this study, we further propose to compose a geared rotationally identical CNN system (GRI-CNN) with a small step angle by connecting ne…
▽ More
Theorems and techniques to form different types of transformationally invariant processing and to produce the same output quantitatively based on either transformationally invariant operators or symmetric operations have recently been introduced by the authors. In this study, we further propose to compose a geared rotationally identical CNN system (GRI-CNN) with a small step angle by connecting networks of participated processes at the first flatten layer. Using an ordinary CNN structure as a base, requirements for constructing a GRI-CNN include the use of either symmetric input vector or kernels with an angle increment that can form a complete cycle as a "gearwheel". Four basic GRI-CNN structures were studied. Each of them can produce quantitatively identical output results when a rotation angle of the input vector is evenly divisible by the step angle of the gear. Our study showed when an input vector rotated with an angle does not match to a step angle, the GRI-CNN can also produce a highly consistent result. With a design of using an ultra-fine gear-tooth step angle (e.g., 1 degree or 0.1 degree), all four GRI-CNN systems can be constructed virtually isotropically.
△ Less
Submitted 10 August, 2018; v1 submitted 2 August, 2018;
originally announced August 2018.
-
Infinite Arms Bandit: Optimality via Confidence Bounds
Authors:
Hock Peng Chan,
Shouri Hu
Abstract:
Berry et al. (1997) initiated the development of the infinite arms bandit problem. They derived a regret lower bound of all allocation strategies for Bernoulli rewards with uniform priors, and proposed strategies based on success runs. Bonald and Proutière (2013) proposed a two-target algorithm that achieves the regret lower bound, and extended optimality to Bernoulli rewards with general priors.…
▽ More
Berry et al. (1997) initiated the development of the infinite arms bandit problem. They derived a regret lower bound of all allocation strategies for Bernoulli rewards with uniform priors, and proposed strategies based on success runs. Bonald and Proutière (2013) proposed a two-target algorithm that achieves the regret lower bound, and extended optimality to Bernoulli rewards with general priors. We present here a confidence bound target (CBT) algorithm that achieves optimality for rewards that are bounded above. For each arm we construct a confidence bound and compare it against each other and a target value to determine if the arm should be sampled further. The target value depends on the assumed priors of the arm means. In the absence of information on the prior, the target value is determined empirically. Numerical studies here show that CBT is versatile and outperforms its competitors.
△ Less
Submitted 21 June, 2020; v1 submitted 29 May, 2018;
originally announced May 2018.
-
Cost-sensitive detection with variational autoencoders for environmental acoustic sensing
Authors:
Yunpeng Li,
Ivan Kiskin,
Davide Zilli,
Marianne Sinka,
Henry Chan,
Kathy Willis,
Stephen Roberts
Abstract:
Environmental acoustic sensing involves the retrieval and processing of audio signals to better understand our surroundings. While large-scale acoustic data make manual analysis infeasible, they provide a suitable playground for machine learning approaches. Most existing machine learning techniques developed for environmental acoustic sensing do not provide flexible control of the trade-off betwee…
▽ More
Environmental acoustic sensing involves the retrieval and processing of audio signals to better understand our surroundings. While large-scale acoustic data make manual analysis infeasible, they provide a suitable playground for machine learning approaches. Most existing machine learning techniques developed for environmental acoustic sensing do not provide flexible control of the trade-off between the false positive rate and the false negative rate. This paper presents a cost-sensitive classification paradigm, in which the hyper-parameters of classifiers and the structure of variational autoencoders are selected in a principled Neyman-Pearson framework. We examine the performance of the proposed approach using a dataset from the HumBug project which aims to detect the presence of mosquitoes using sound collected by simple embedded devices.
△ Less
Submitted 6 December, 2017;
originally announced December 2017.
-
Mosquito detection with low-cost smartphones: data acquisition for malaria research
Authors:
Yunpeng Li,
Davide Zilli,
Henry Chan,
Ivan Kiskin,
Marianne Sinka,
Stephen Roberts,
Kathy Willis
Abstract:
Mosquitoes are a major vector for malaria, causing hundreds of thousands of deaths in the developing world each year. Not only is the prevention of mosquito bites of paramount importance to the reduction of malaria transmission cases, but understanding in more forensic detail the interplay between malaria, mosquito vectors, vegetation, standing water and human populations is crucial to the deploym…
▽ More
Mosquitoes are a major vector for malaria, causing hundreds of thousands of deaths in the developing world each year. Not only is the prevention of mosquito bites of paramount importance to the reduction of malaria transmission cases, but understanding in more forensic detail the interplay between malaria, mosquito vectors, vegetation, standing water and human populations is crucial to the deployment of more effective interventions. Typically the presence and detection of malaria-vectoring mosquitoes is only quantified by hand-operated insect traps or signified by the diagnosis of malaria. If we are to gather timely, large-scale data to improve this situation, we need to automate the process of mosquito detection and classification as much as possible. In this paper, we present a candidate mobile sensing system that acts as both a portable early warning device and an automatic acoustic data acquisition pipeline to help fuel scientific inquiry and policy. The machine learning algorithm that powers the mobile system achieves excellent off-line multi-species detection performance while remaining computationally efficient. Further, we have conducted preliminary live mosquito detection tests using low-cost mobile phones and achieved promising results. The deployment of this system for field usage in Southeast Asia and Africa is planned in the near future. In order to accelerate processing of field recordings and labelling of collected data, we employ a citizen science platform in conjunction with automated methods, the former implemented using the Zooniverse platform, allowing crowdsourcing on a grand scale.
△ Less
Submitted 5 December, 2017; v1 submitted 16 November, 2017;
originally announced November 2017.
-
An Efficient and Flexible Spike Train Model via Empirical Bayes
Authors:
Qi She,
Xiaoli Wu,
Beth Jelfs,
Adam S. Charles,
Rosa H. M. Chan
Abstract:
Accurate statistical models of neural spike responses can characterize the information carried by neural populations. But the limited samples of spike counts during recording usually result in model overfitting. Besides, current models assume spike counts to be Poisson-distributed, which ignores the fact that many neurons demonstrate over-dispersed spiking behaviour. Although the Negative Binomial…
▽ More
Accurate statistical models of neural spike responses can characterize the information carried by neural populations. But the limited samples of spike counts during recording usually result in model overfitting. Besides, current models assume spike counts to be Poisson-distributed, which ignores the fact that many neurons demonstrate over-dispersed spiking behaviour. Although the Negative Binomial Generalized Linear Model (NB-GLM) provides a powerful tool for modeling over-dispersed spike counts, the maximum likelihood-based standard NB-GLM leads to highly variable and inaccurate parameter estimates. Thus, we propose a hierarchical parametric empirical Bayes method to estimate the neural spike responses among neuronal population. Our method integrates both Generalized Linear Models (GLMs) and empirical Bayes theory, which aims to (1) improve the accuracy and reliability of parameter estimation, compared to the maximum likelihood-based method for NB-GLM and Poisson-GLM; (2) effectively capture the over-dispersion nature of spike counts from both simulated data and experimental data; and (3) provide insight into both neural interactions and spiking behaviours of the neuronal populations. We apply our approach to study both simulated data and experimental neural data. The estimation of simulation data indicates that the new framework can accurately predict mean spike counts simulated from different models and recover the connectivity weights among neural populations. The estimation based on retinal neurons demonstrate the proposed method outperforms both NB-GLM and Poisson-GLM in terms of the predictive log-likelihood of held-out data. Codes are available in https://doi.org/10.5281/zenodo.4704423
△ Less
Submitted 27 April, 2021; v1 submitted 10 May, 2016;
originally announced May 2016.
-
Adaptive Image Denoising by Mixture Adaptation
Authors:
Enming Luo,
Stanley H. Chan,
Truong Q. Nguyen
Abstract:
We propose an adaptive learning procedure to learn patch-based image priors for image denoising. The new algorithm, called the Expectation-Maximization (EM) adaptation, takes a generic prior learned from a generic external database and adapts it to the noisy image to generate a specific prior. Different from existing methods that combine internal and external statistics in ad-hoc ways, the propose…
▽ More
We propose an adaptive learning procedure to learn patch-based image priors for image denoising. The new algorithm, called the Expectation-Maximization (EM) adaptation, takes a generic prior learned from a generic external database and adapts it to the noisy image to generate a specific prior. Different from existing methods that combine internal and external statistics in ad-hoc ways, the proposed algorithm is rigorously derived from a Bayesian hyper-prior perspective. There are two contributions of this paper: First, we provide full derivation of the EM adaptation algorithm and demonstrate methods to improve the computational complexity. Second, in the absence of the latent clean image, we show how EM adaptation can be modified based on pre-filtering. Experimental results show that the proposed adaptation algorithm yields consistently better denoising results than the one without adaptation and is superior to several state-of-the-art algorithms.
△ Less
Submitted 24 June, 2016; v1 submitted 18 January, 2016;
originally announced January 2016.
-
Topic-adjusted visibility metric for scientific articles
Authors:
Linda S. L. Tan,
Aik Hui Chan,
Tian Zheng
Abstract:
Measuring the impact of scientific articles is important for evaluating the research output of individual scientists, academic institutions and journals. While citations are raw data for constructing impact measures, there exist biases and potential issues if factors affecting citation patterns are not properly accounted for. In this work, we address the problem of field variation and introduce an…
▽ More
Measuring the impact of scientific articles is important for evaluating the research output of individual scientists, academic institutions and journals. While citations are raw data for constructing impact measures, there exist biases and potential issues if factors affecting citation patterns are not properly accounted for. In this work, we address the problem of field variation and introduce an article level metric useful for evaluating individual articles' visibility. This measure derives from joint probabilistic modeling of the content in the articles and the citations amongst them using latent Dirichlet allocation (LDA) and the mixed membership stochastic blockmodel (MMSB). Our proposed model provides a visibility metric for individual articles adjusted for field variation in citation rates, a structural understanding of citation behavior in different fields, and article recommendations which take into account article visibility and citation patterns. We develop an efficient algorithm for model fitting using variational methods. To scale up to large networks, we develop an online variant using stochastic gradient methods and case-control likelihood approximation. We apply our methods to the benchmark KDD Cup 2003 dataset with approximately 30,000 high energy physics papers.
△ Less
Submitted 16 October, 2015; v1 submitted 25 February, 2015;
originally announced February 2015.
-
Mutual Information-Based Unsupervised Feature Transformation for Heterogeneous Feature Subset Selection
Authors:
Min Wei,
Tommy W. S. Chow,
Rosa H. M. Chan
Abstract:
Conventional mutual information (MI) based feature selection (FS) methods are unable to handle heterogeneous feature subset selection properly because of data format differences or estimation methods of MI between feature subset and class label. A way to solve this problem is feature transformation (FT). In this study, a novel unsupervised feature transformation (UFT) which can transform non-numer…
▽ More
Conventional mutual information (MI) based feature selection (FS) methods are unable to handle heterogeneous feature subset selection properly because of data format differences or estimation methods of MI between feature subset and class label. A way to solve this problem is feature transformation (FT). In this study, a novel unsupervised feature transformation (UFT) which can transform non-numerical features into numerical features is developed and tested. The UFT process is MI-based and independent of class label. MI-based FS algorithms, such as Parzen window feature selector (PWFS), minimum redundancy maximum relevance feature selection (mRMR), and normalized MI feature selection (NMIFS), can all adopt UFT for pre-processing of non-numerical features. Unlike traditional FT methods, the proposed UFT is unbiased while PWFS is utilized to its full advantage. Simulations and analyses of large-scale datasets showed that feature subset selected by the integrated method, UFT-PWFS, outperformed other FT-FS integrated methods in classification accuracy.
△ Less
Submitted 29 March, 2015; v1 submitted 24 November, 2014;
originally announced November 2014.
-
Adaptive Image Denoising by Targeted Databases
Authors:
Enming Luo,
Stanley H. Chan,
Truong Q. Nguyen
Abstract:
We propose a data-dependent denoising procedure to restore noisy images. Different from existing denoising algorithms which search for patches from either the noisy image or a generic database, the new algorithm finds patches from a database that contains only relevant patches. We formulate the denoising problem as an optimal filter design problem and make two contributions. First, we determine th…
▽ More
We propose a data-dependent denoising procedure to restore noisy images. Different from existing denoising algorithms which search for patches from either the noisy image or a generic database, the new algorithm finds patches from a database that contains only relevant patches. We formulate the denoising problem as an optimal filter design problem and make two contributions. First, we determine the basis function of the denoising filter by solving a group sparsity minimization problem. The optimization formulation generalizes existing denoising algorithms and offers systematic analysis of the performance. Improvement methods are proposed to enhance the patch search process. Second, we determine the spectral coefficients of the denoising filter by considering a localized Bayesian prior. The localized prior leverages the similarity of the targeted database, alleviates the intensive Bayesian computation, and links the new method to the classical linear minimum mean squared error estimation. We demonstrate applications of the proposed method in a variety of scenarios, including text images, multiview images and face images. Experimental results show the superiority of the new algorithm over existing methods.
△ Less
Submitted 3 November, 2014; v1 submitted 30 June, 2014;
originally announced July 2014.
-
A Consistent Histogram Estimator for Exchangeable Graph Models
Authors:
Stanley H. Chan,
Edoardo M. Airoldi
Abstract:
Exchangeable graph models (ExGM) subsume a number of popular network models. The mathematical object that characterizes an ExGM is termed a graphon. Finding scalable estimators of graphons, provably consistent, remains an open issue. In this paper, we propose a histogram estimator of a graphon that is provably consistent and numerically efficient. The proposed estimator is based on a sorting-and-s…
▽ More
Exchangeable graph models (ExGM) subsume a number of popular network models. The mathematical object that characterizes an ExGM is termed a graphon. Finding scalable estimators of graphons, provably consistent, remains an open issue. In this paper, we propose a histogram estimator of a graphon that is provably consistent and numerically efficient. The proposed estimator is based on a sorting-and-smoothing (SAS) algorithm, which first sorts the empirical degree of a graph, then smooths the sorted graph using total variation minimization. The consistency of the SAS algorithm is proved by leveraging sparsity concepts from compressed sensing.
△ Less
Submitted 11 February, 2014; v1 submitted 8 February, 2014;
originally announced February 2014.
-
Monte Carlo non local means: Random sampling for large-scale image filtering
Authors:
Stanley H. Chan,
Todd Zickler,
Yue M. Lu
Abstract:
We propose a randomized version of the non-local means (NLM) algorithm for large-scale image filtering. The new algorithm, called Monte Carlo non-local means (MCNLM), speeds up the classical NLM by computing a small subset of image patch distances, which are randomly selected according to a designed sampling pattern. We make two contributions. First, we analyze the performance of the MCNLM algorit…
▽ More
We propose a randomized version of the non-local means (NLM) algorithm for large-scale image filtering. The new algorithm, called Monte Carlo non-local means (MCNLM), speeds up the classical NLM by computing a small subset of image patch distances, which are randomly selected according to a designed sampling pattern. We make two contributions. First, we analyze the performance of the MCNLM algorithm and show that, for large images or large external image databases, the random outcomes of MCNLM are tightly concentrated around the deterministic full NLM result. In particular, our error probability bounds show that, at any given sampling ratio, the probability for MCNLM to have a large deviation from the original NLM solution decays exponentially as the size of the image or database grows. Second, we derive explicit formulas for optimal sampling patterns that minimize the error probability bound by exploiting partial knowledge of the pairwise similarity weights. Numerical experiments show that MCNLM is competitive with other state-of-the-art fast NLM algorithms for single-image denoising. When applied to denoising images using an external database containing ten billion patches, MCNLM returns a randomized solution that is within 0.2 dB of the full NLM solution while reducing the runtime by three orders of magnitude.
△ Less
Submitted 14 May, 2014; v1 submitted 27 December, 2013;
originally announced December 2013.
-
Stochastic blockmodel approximation of a graphon: Theory and consistent estimation
Authors:
Edoardo M Airoldi,
Thiago B Costa,
Stanley H Chan
Abstract:
Non-parametric approaches for analyzing network data based on exchangeable graph models (ExGM) have recently gained interest. The key object that defines an ExGM is often referred to as a graphon. This non-parametric perspective on network modeling poses challenging questions on how to make inference on the graphon underlying observed network data. In this paper, we propose a computationally effic…
▽ More
Non-parametric approaches for analyzing network data based on exchangeable graph models (ExGM) have recently gained interest. The key object that defines an ExGM is often referred to as a graphon. This non-parametric perspective on network modeling poses challenging questions on how to make inference on the graphon underlying observed network data. In this paper, we propose a computationally efficient procedure to estimate a graphon from a set of observed networks generated from it. This procedure is based on a stochastic blockmodel approximation (SBA) of the graphon. We show that, by approximating the graphon with a stochastic block model, the graphon can be consistently estimated, that is, the estimation error vanishes as the size of the graph approaches infinity.
△ Less
Submitted 7 November, 2013; v1 submitted 7 November, 2013;
originally announced November 2013.
-
Multiscale Adaptive Inference on Conditional Moment Inequalities
Authors:
Timothy B. Armstrong,
Hock Peng Chan
Abstract:
This paper considers inference for conditional moment inequality models using a multiscale statistic. We derive the asymptotic distribution of this test statistic and use the result to propose feasible critical values that have a simple analytic formula, and to prove the asymptotic validity of a modified bootstrap procedure. The asymptotic distribution is extreme value, and the proof uses new tech…
▽ More
This paper considers inference for conditional moment inequality models using a multiscale statistic. We derive the asymptotic distribution of this test statistic and use the result to propose feasible critical values that have a simple analytic formula, and to prove the asymptotic validity of a modified bootstrap procedure. The asymptotic distribution is extreme value, and the proof uses new techniques to overcome several technical obstacles. The test detects local alternatives that approach the identified set at the best rate among available tests in a broad class of models, and is adaptive to the smoothness properties of the data generating process. Our results also have implications for the use of moment selection procedures in this setting. We provide a monte carlo study and an empirical illustration to inference in a regression model with endogenously censored and missing data.
△ Less
Submitted 8 December, 2015; v1 submitted 22 December, 2012;
originally announced December 2012.
-
Reasoning about Bayesian Network Classifiers
Authors:
Hei Chan,
Adnan Darwiche
Abstract:
Bayesian network classifiers are used in many fields, and one common class of classifiers are naive Bayes classifiers. In this paper, we introduce an approach for reasoning about Bayesian network classifiers in which we explicitly convert them into Ordered Decision Diagrams (ODDs), which are then used to reason about the properties of these classifiers. Specifically, we present an algorithm for co…
▽ More
Bayesian network classifiers are used in many fields, and one common class of classifiers are naive Bayes classifiers. In this paper, we introduce an approach for reasoning about Bayesian network classifiers in which we explicitly convert them into Ordered Decision Diagrams (ODDs), which are then used to reason about the properties of these classifiers. Specifically, we present an algorithm for converting any naive Bayes classifier into an ODD, and we show theoretically and experimentally that this algorithm can give us an ODD that is tractable in size even given an intractable number of instances. Since ODDs are tractable representations of classifiers, our algorithm allows us to efficiently test the equivalence of two naive Bayes classifiers and characterize discrepancies between them. We also show a number of additional results including a count of distinct classifiers that can be induced by changing some CPT in a naive Bayes classifier, and the range of allowable changes to a CPT which keeps the current classifier unchanged.
△ Less
Submitted 19 October, 2012;
originally announced December 2012.
-
Detection with the scan and the average likelihood ratio
Authors:
Hock Peng Chan,
Guenther Walther
Abstract:
We investigate the performance of the scan (maximum likelihood ratio statistic) and of the average likelihood ratio statistic in the problem of detecting a deterministic signal with unknown spatial extent in the prototypical univariate sampled data model with white Gaussian noise. Our results show that the scan statistic, a popular tool for detection problems, is optimal only for the detection of…
▽ More
We investigate the performance of the scan (maximum likelihood ratio statistic) and of the average likelihood ratio statistic in the problem of detecting a deterministic signal with unknown spatial extent in the prototypical univariate sampled data model with white Gaussian noise. Our results show that the scan statistic, a popular tool for detection problems, is optimal only for the detection of signals with the smallest spatial extent. For signals with larger spatial extent the scan is suboptimal, and the power loss can be considerable. In contrast, the average likelihood ratio statistic is optimal for the detection of signals on all scales except the smallest ones, where its performance is only slightly suboptimal. We give rigorous mathematical statements of these results as well as heuristic explanations which suggest that the essence of these findings applies to detection problems quite generally, such as the detection of clusters in models involving densities or intensities or the detection of multivariate signals. We present a modification of the average likelihood ratio that yields optimal detection of signals with arbitrary spatial extent and which has the additional benefit of allowing for a fast computation of the statistic. In contrast, optimal detection with the scan seems to require the use of scale-dependent critical values.
△ Less
Submitted 25 February, 2014; v1 submitted 21 July, 2011;
originally announced July 2011.
-
Importance Sampling of Word Patterns in DNA and Protein Sequences
Authors:
Hock Peng Chan,
Nancy R. Zhang,
Louis H. Y. Chen
Abstract:
Monte Carlo methods can provide accurate p-value estimates of word counting test statistics and are easy to implement. They are especially attractive when an asymptotic theory is absent or when either the search sequence or the word pattern is too short for the application of asymptotic formulae. Naive direct Monte Carlo is undesirable for the estimation of small probabilities because the associ…
▽ More
Monte Carlo methods can provide accurate p-value estimates of word counting test statistics and are easy to implement. They are especially attractive when an asymptotic theory is absent or when either the search sequence or the word pattern is too short for the application of asymptotic formulae. Naive direct Monte Carlo is undesirable for the estimation of small probabilities because the associated rare events of interest are seldom generated. We propose instead efficient importance sampling algorithms that use controlled insertion of the desired word patterns on randomly generated sequences. The implementation is illustrated on word patterns of biological interest: Palindromes and inverted repeats, patterns arising from position specific weight matrices and co-occurrences of pairs of motifs.
△ Less
Submitted 26 November, 2008;
originally announced November 2008.