-
Understanding Deep Learning via Decision Boundary
Authors:
Shiye Lei,
Fengxiang He,
Yancheng Yuan,
Dacheng Tao
Abstract:
This paper discovers that the neural network with lower decision boundary (DB) variability has better generalizability. Two new notions, algorithm DB variability and $(ε, η)$-data DB variability, are proposed to measure the decision boundary variability from the algorithm and data perspectives. Extensive experiments show significant negative correlations between the decision boundary variability a…
▽ More
This paper discovers that the neural network with lower decision boundary (DB) variability has better generalizability. Two new notions, algorithm DB variability and $(ε, η)$-data DB variability, are proposed to measure the decision boundary variability from the algorithm and data perspectives. Extensive experiments show significant negative correlations between the decision boundary variability and the generalizability. From the theoretical view, two lower bounds based on algorithm DB variability are proposed and do not explicitly depend on the sample size. We also prove an upper bound of order $\mathcal{O}\left(\frac{1}{\sqrt{m}}+ε+η\log\frac{1}η\right)$ based on data DB variability. The bound is convenient to estimate without the requirement of labels, and does not explicitly depend on the network size which is usually prohibitively large in deep learning.
△ Less
Submitted 24 December, 2023; v1 submitted 3 June, 2022;
originally announced June 2022.
-
Spatial-Temporal-Fusion BNN: Variational Bayesian Feature Layer
Authors:
Shiye Lei,
Zhuozhuo Tu,
Leszek Rutkowski,
Feng Zhou,
Li Shen,
Fengxiang He,
Dacheng Tao
Abstract:
Bayesian neural networks (BNNs) have become a principal approach to alleviate overconfident predictions in deep learning, but they often suffer from scaling issues due to a large number of distribution parameters. In this paper, we discover that the first layer of a deep network possesses multiple disparate optima when solely retrained. This indicates a large posterior variance when the first laye…
▽ More
Bayesian neural networks (BNNs) have become a principal approach to alleviate overconfident predictions in deep learning, but they often suffer from scaling issues due to a large number of distribution parameters. In this paper, we discover that the first layer of a deep network possesses multiple disparate optima when solely retrained. This indicates a large posterior variance when the first layer is altered by a Bayesian layer, which motivates us to design a spatial-temporal-fusion BNN (STF-BNN) for efficiently scaling BNNs to large models: (1) first normally train a neural network from scratch to realize fast training; and (2) the first layer is converted to Bayesian and inferred by employing stochastic variational inference, while other layers are fixed. Compared to vanilla BNNs, our approach can greatly reduce the training time and the number of parameters, which contributes to scale BNNs efficiently. We further provide theoretical guarantees on the generalizability and the capability of mitigating overconfidence of STF-BNN. Comprehensive experiments demonstrate that STF-BNN (1) achieves the state-of-the-art performance on prediction and uncertainty quantification; (2) significantly improves adversarial robustness and privacy preservation; and (3) considerably reduces training time and memory costs.
△ Less
Submitted 12 December, 2021;
originally announced December 2021.
-
Spectral Complexity-scaled Generalization Bound of Complex-valued Neural Networks
Authors:
Haowen Chen,
Fengxiang He,
Shiye Lei,
Dacheng Tao
Abstract:
Complex-valued neural networks (CVNNs) have been widely applied to various fields, especially signal processing and image recognition. However, few works focus on the generalization of CVNNs, albeit it is vital to ensure the performance of CVNNs on unseen data. This paper is the first work that proves a generalization bound for the complex-valued neural network. The bound scales with the spectral…
▽ More
Complex-valued neural networks (CVNNs) have been widely applied to various fields, especially signal processing and image recognition. However, few works focus on the generalization of CVNNs, albeit it is vital to ensure the performance of CVNNs on unseen data. This paper is the first work that proves a generalization bound for the complex-valued neural network. The bound scales with the spectral complexity, the dominant factor of which is the spectral norm product of weight matrices. Further, our work provides a generalization bound for CVNNs when training data is sequential, which is also affected by the spectral complexity. Theoretically, these bounds are derived via Maurey Sparsification Lemma and Dudley Entropy Integral. Empirically, we conduct experiments by training complex-valued convolutional neural networks on different datasets: MNIST, FashionMNIST, CIFAR-10, CIFAR-100, Tiny ImageNet, and IMDB. Spearman's rank-order correlation coefficients and the corresponding p values on these datasets give strong proof that the spectral complexity of the network, measured by the weight matrices spectral norm product, has a statistically significant correlation with the generalization ability.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
Binary Classification from Multiple Unlabeled Datasets via Surrogate Set Classification
Authors:
Nan Lu,
Shida Lei,
Gang Niu,
Issei Sato,
Masashi Sugiyama
Abstract:
To cope with high annotation costs, training a classifier only from weakly supervised data has attracted a great deal of attention these days. Among various approaches, strengthening supervision from completely unsupervised classification is a promising direction, which typically employs class priors as the only supervision and trains a binary classifier from unlabeled (U) datasets. While existing…
▽ More
To cope with high annotation costs, training a classifier only from weakly supervised data has attracted a great deal of attention these days. Among various approaches, strengthening supervision from completely unsupervised classification is a promising direction, which typically employs class priors as the only supervision and trains a binary classifier from unlabeled (U) datasets. While existing risk-consistent methods are theoretically grounded with high flexibility, they can learn only from two U sets. In this paper, we propose a new approach for binary classification from $m$ U-sets for $m\ge2$. Our key idea is to consider an auxiliary classification task called surrogate set classification (SSC), which is aimed at predicting from which U set each observed data is drawn. SSC can be solved by a standard (multi-class) classification method, and we use the SSC solution to obtain the final binary classifier through a certain linear-fractional transformation. We built our method in a flexible and efficient end-to-end deep learning framework and prove it to be classifier-consistent. Through experiments, we demonstrate the superiority of our proposed method over state-of-the-art methods.
△ Less
Submitted 11 June, 2021; v1 submitted 1 February, 2021;
originally announced February 2021.
-
Neural networks behave as hash encoders: An empirical study
Authors:
Fengxiang He,
Shiye Lei,
Jianmin Ji,
Dacheng Tao
Abstract:
The input space of a neural network with ReLU-like activations is partitioned into multiple linear regions, each corresponding to a specific activation pattern of the included ReLU-like activations. We demonstrate that this partition exhibits the following encoding properties across a variety of deep learning models: (1) {\it determinism}: almost every linear region contains at most one training e…
▽ More
The input space of a neural network with ReLU-like activations is partitioned into multiple linear regions, each corresponding to a specific activation pattern of the included ReLU-like activations. We demonstrate that this partition exhibits the following encoding properties across a variety of deep learning models: (1) {\it determinism}: almost every linear region contains at most one training example. We can therefore represent almost every training example by a unique activation pattern, which is parameterized by a {\it neural code}; and (2) {\it categorization}: according to the neural code, simple algorithms, such as $K$-Means, $K$-NN, and logistic regression, can achieve fairly good performance on both training and test data. These encoding properties surprisingly suggest that {\it normal neural networks well-trained for classification behave as hash encoders without any extra efforts.} In addition, the encoding properties exhibit variability in different scenarios. {Further experiments demonstrate that {\it model size}, {\it training time}, {\it training sample size}, {\it regularization}, and {\it label noise} contribute in shaping the encoding properties, while the impacts of the first three are dominant.} We then define an {\it activation hash phase chart} to represent the space expanded by {model size}, training time, training sample size, and the encoding properties, which is divided into three canonical regions: {\it under-expressive regime}, {\it critically-expressive regime}, and {\it sufficiently-expressive regime}. The source code package is available at \url{https://github.com/LeavesLei/activation-code}.
△ Less
Submitted 14 January, 2021;
originally announced January 2021.
-
Baseline Estimation of Commercial Building HVAC Fan Power Using Tensor Completion
Authors:
Shunbo Lei,
David Hong,
Johanna L. Mathieu,
Ian A. Hiskens
Abstract:
Commercial building heating, ventilation, and air conditioning (HVAC) systems have been studied for providing ancillary services to power grids via demand response (DR). One critical issue is to estimate the counterfactual baseline power consumption that would have prevailed without DR. Baseline methods have been developed based on whole building electric load profiles. New methods are necessary t…
▽ More
Commercial building heating, ventilation, and air conditioning (HVAC) systems have been studied for providing ancillary services to power grids via demand response (DR). One critical issue is to estimate the counterfactual baseline power consumption that would have prevailed without DR. Baseline methods have been developed based on whole building electric load profiles. New methods are necessary to estimate the baseline power consumption of HVAC sub-components (e.g., supply and return fans), which have different characteristics compared to that of the whole building. Tensor completion can estimate the unobserved entries of multi-dimensional tensors describing complex data sets. It exploits high-dimensional data to capture granular insights into the problem. This paper proposes to use it for baselining HVAC fan power, by utilizing its capability of capturing dominant fan power patterns. The tensor completion method is evaluated using HVAC fan power data from several buildings at the University of Michigan, and compared with several existing methods. The tensor completion method generally outperforms the benchmarks.
△ Less
Submitted 24 April, 2020;
originally announced April 2020.
-
Robust Regression via Online Feature Selection under Adversarial Data Corruption
Authors:
Xuchao Zhang,
Shuo Lei,
Liang Zhao,
Arnold P. Boedihardjo,
Chang-Tien Lu
Abstract:
The presence of data corruption in user-generated streaming data, such as social media, motivates a new fundamental problem that learns reliable regression coefficient when features are not accessible entirely at one time. Until now, several important challenges still cannot be handled concurrently: 1) corrupted data estimation when only partial features are accessible; 2) online feature selection…
▽ More
The presence of data corruption in user-generated streaming data, such as social media, motivates a new fundamental problem that learns reliable regression coefficient when features are not accessible entirely at one time. Until now, several important challenges still cannot be handled concurrently: 1) corrupted data estimation when only partial features are accessible; 2) online feature selection when data contains adversarial corruption; and 3) scaling to a massive dataset. This paper proposes a novel RObust regression algorithm via Online Feature Selection (\textit{RoOFS}) that concurrently addresses all the above challenges. Specifically, the algorithm iteratively updates the regression coefficients and the uncorrupted set via a robust online feature substitution method. We also prove that our algorithm has a restricted error bound compared to the optimal solution. Extensive empirical experiments in both synthetic and real-world datasets demonstrated that the effectiveness of our new method is superior to that of existing methods in the recovery of both feature selection and regression coefficients, with very competitive efficiency.
△ Less
Submitted 5 February, 2019;
originally announced February 2019.
-
Feature Selection and Model Comparison on Microsoft Learning-to-Rank Data Sets
Authors:
Xinzhi Han,
Sen Lei
Abstract:
With the rapid advance of the Internet, search engines (e.g., Google, Bing, Yahoo!) are used by billions of users for each day. The main function of a search engine is to locate the most relevant webpages corresponding to what the user requests. This report focuses on the core problem of information retrieval: how to learn the relevance between a document (very often webpage) and a query given by…
▽ More
With the rapid advance of the Internet, search engines (e.g., Google, Bing, Yahoo!) are used by billions of users for each day. The main function of a search engine is to locate the most relevant webpages corresponding to what the user requests. This report focuses on the core problem of information retrieval: how to learn the relevance between a document (very often webpage) and a query given by user. Our analysis consists of two parts: 1) we use standard statistical methods to select important features among 137 candidates given by information retrieval researchers from Microsoft. We find that not all the features are useful, and give interpretations on the top-selected features; 2) we give baselines on prediction over the real-world dataset MSLR-WEB by using various learning algorithms. We find that models of boosting trees, random forest in general achieve the best performance of prediction. This agrees with the mainstream opinion in information retrieval community that tree-based algorithms outperform the other candidates for this problem.
△ Less
Submitted 13 March, 2018;
originally announced March 2018.
-
CFAR Adaptive Matched Detector for Target Detection in Non-Gaussian Noise With Inverse Gamma Texture
Authors:
Shiwen Lei,
Andreas Jakobsson,
Zhiqin Zhao
Abstract:
In this paper, we propose an adaptive matched detector of a signal corrupted by a non-Gaussian noise with an inverse gamma texture. The detector is formed using a set of secondary data measurements, and is analytically shown to have a constant false alarm rate. The analytic performance is validated using Monte Carlo simulations, and the proposed detector is shown to offer preferable performance as…
▽ More
In this paper, we propose an adaptive matched detector of a signal corrupted by a non-Gaussian noise with an inverse gamma texture. The detector is formed using a set of secondary data measurements, and is analytically shown to have a constant false alarm rate. The analytic performance is validated using Monte Carlo simulations, and the proposed detector is shown to offer preferable performance as compared to the related one-step generalized likelihood ratio test (1S-GLRT) and the adaptive subspace detector (ASD).
△ Less
Submitted 12 May, 2017;
originally announced May 2017.