-
Non-Asymptotic Analysis of Online Local Private Learning with SGD
Authors:
Enze Shi,
Jinhan Xie,
Bei Jiang,
Linglong Kong,
Xuming He
Abstract:
Differentially Private Stochastic Gradient Descent (DP-SGD) has been widely used for solving optimization problems with privacy guarantees in machine learning and statistics. Despite this, a systematic non-asymptotic convergence analysis for DP-SGD, particularly in the context of online problems and local differential privacy (LDP) models, remains largely elusive. Existing non-asymptotic analyses…
▽ More
Differentially Private Stochastic Gradient Descent (DP-SGD) has been widely used for solving optimization problems with privacy guarantees in machine learning and statistics. Despite this, a systematic non-asymptotic convergence analysis for DP-SGD, particularly in the context of online problems and local differential privacy (LDP) models, remains largely elusive. Existing non-asymptotic analyses have focused on non-private optimization methods, and hence are not applicable to privacy-preserving optimization problems. This work initiates the analysis to bridge this gap and opens the door to non-asymptotic convergence analysis of private optimization problems. A general framework is investigated for the online LDP model in stochastic optimization problems. We assume that sensitive information from individuals is collected sequentially and aim to estimate, in real-time, a static parameter that pertains to the population of interest. Most importantly, we conduct a comprehensive non-asymptotic convergence analysis of the proposed estimators in finite-sample situations, which gives their users practical guidelines regarding the effect of various hyperparameters, such as step size, parameter dimensions, and privacy budgets, on convergence rates. Our proposed estimators are validated in the theoretical and practical realms by rigorous mathematical derivations and carefully constructed numerical experiments.
△ Less
Submitted 9 July, 2025;
originally announced July 2025.
-
Research on the recommendation framework of foreign enterprises from the perspective of multidimensional proximity
Authors:
Guoqiang Liang,
Jiarui Xie,
Mengxuan Li,
Shuo Zhang
Abstract:
As global economic integration progresses, foreign-funded enterprises play an increasingly crucial role in fostering local economic growth and enhancing industrial development. However, there are not many researches to deal with this aspect in recent years. This study utilizes the multidimensional proximity theory to thoroughly examine the criteria for selecting high-quality foreign-funded compani…
▽ More
As global economic integration progresses, foreign-funded enterprises play an increasingly crucial role in fostering local economic growth and enhancing industrial development. However, there are not many researches to deal with this aspect in recent years. This study utilizes the multidimensional proximity theory to thoroughly examine the criteria for selecting high-quality foreign-funded companies that are likely to invest in and establish factories in accordance with local conditions during the investment attraction process.First, this study leverages databases such as Wind and Osiris, along with government policy documents, to investigate foreign-funded enterprises and establish a high-quality database. Second, using a two-step method, enterprises aligned with local industrial strategies are identified. Third, a detailed analysis is conducted on key metrics, including industry revenue, concentration (measured by the Herfindahl-Hirschman Index), and geographical distance (calculated using the Haversine formula). Finally, a multi-criteria decision analysis ranks the top five companies as the most suitable candidates for local investment, with the methodology validated through a case study in a district of Beijing.The example results show that the established framework helps local governments identify high-quality foreign-funded enterprises.
△ Less
Submitted 21 June, 2025;
originally announced June 2025.
-
A Generalized Framework for Approximate Co-Sufficient Sampling
Authors:
Jie Xie,
Dongming Huang
Abstract:
Approximate co-sufficient sampling (aCSS) offers a principled route to hypothesis testing when null distributions are unknown, yet current implementations are confined to maximum likelihood estimators with smooth or linear regularization and provide little theoretical insight into power. We present a generalized framework that widens the scope of the aCSS method to embrace nonlinear regularization…
▽ More
Approximate co-sufficient sampling (aCSS) offers a principled route to hypothesis testing when null distributions are unknown, yet current implementations are confined to maximum likelihood estimators with smooth or linear regularization and provide little theoretical insight into power. We present a generalized framework that widens the scope of the aCSS method to embrace nonlinear regularization, such as group lasso and nonconvex penalties, as well as robust and nonparametric estimators. Moreover, we introduce a weighted sampling scheme for enhanced flexibility and propose a generalized aCSS framework that unifies existing conditional sampling methods. Our theoretical analysis rigorously establishes validity and, for the first time, characterizes the power optimality of aCSS procedures in certain high-dimensional settings.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
The Spurious Factor Dilemma: Robust Inference in Heavy-Tailed Elliptical Factor Models
Authors:
Jiang Hu,
Jiahui Xie,
Yangchun Zhang,
Wang Zhou
Abstract:
Factor models are essential tools for analyzing high-dimensional data, particularly in economics and finance. However, standard methods for determining the number of factors often overestimate the true number when data exhibit heavy-tailed randomness, misinterpreting noise-induced outliers as genuine factors. This paper addresses this challenge within the framework of Elliptical Factor Models (EFM…
▽ More
Factor models are essential tools for analyzing high-dimensional data, particularly in economics and finance. However, standard methods for determining the number of factors often overestimate the true number when data exhibit heavy-tailed randomness, misinterpreting noise-induced outliers as genuine factors. This paper addresses this challenge within the framework of Elliptical Factor Models (EFM), which accommodate both heavy tails and potential non-linear dependencies common in real-world data. We demonstrate theoretically and empirically that heavy-tailed noise generates spurious eigenvalues that mimic true factor signals. To distinguish these, we propose a novel methodology based on a fluctuation magnification algorithm. We show that under magnifying perturbations, the eigenvalues associated with real factors exhibit significantly less fluctuation (stabilizing asymptotically) compared to spurious eigenvalues arising from heavy-tailed effects. This differential behavior allows the identification and detection of the true and spurious factors. We develop a formal testing procedure based on this principle and apply it to the problem of accurately selecting the number of common factors in heavy-tailed EFMs. Simulation studies and real data analysis confirm the effectiveness of our approach compared to existing methods, particularly in scenarios with pronounced heavy-tailedness.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
HR-VILAGE-3K3M: A Human Respiratory Viral Immunization Longitudinal Gene Expression Dataset for Systems Immunity
Authors:
Xuejun Sun,
Yiran Song,
Xiaochen Zhou,
Ruilie Cai,
Yu Zhang,
Xinyi Li,
Rui Peng,
Jialiu Xie,
Yuanyuan Yan,
Muyao Tang,
Prem Lakshmanane,
Baiming Zou,
James S. Hagood,
Raymond J. Pickles,
Didong Li,
Fei Zou,
Xiaojing Zheng
Abstract:
Respiratory viral infections pose a global health burden, yet the cellular immune responses driving protection or pathology remain unclear. Natural infection cohorts often lack pre-exposure baseline data and structured temporal sampling. In contrast, inoculation and vaccination trials generate insightful longitudinal transcriptomic data. However, the scattering of these datasets across platforms,…
▽ More
Respiratory viral infections pose a global health burden, yet the cellular immune responses driving protection or pathology remain unclear. Natural infection cohorts often lack pre-exposure baseline data and structured temporal sampling. In contrast, inoculation and vaccination trials generate insightful longitudinal transcriptomic data. However, the scattering of these datasets across platforms, along with inconsistent metadata and preprocessing procedure, hinders AI-driven discovery. To address these challenges, we developed the Human Respiratory Viral Immunization LongitudinAl Gene Expression (HR-VILAGE-3K3M) repository: an AI-ready, rigorously curated dataset that integrates 14,136 RNA-seq profiles from 3,178 subjects across 66 studies encompassing over 2.56 million cells. Spanning vaccination, inoculation, and mixed exposures, the dataset includes microarray, bulk RNA-seq, and single-cell RNA-seq from whole blood, PBMCs, and nasal swabs, sourced from GEO, ImmPort, and ArrayExpress. We harmonized subject-level metadata, standardized outcome measures, applied unified preprocessing pipelines with rigorous quality control, and aligned all data to official gene symbols. To demonstrate the utility of HR-VILAGE-3K3M, we performed predictive modeling of vaccine responders and evaluated batch-effect correction methods. Beyond these initial demonstrations, it supports diverse systems immunology applications and benchmarking of feature selection and transfer learning algorithms. Its scale and heterogeneity also make it ideal for pretraining foundation models of the human immune response and for advancing multimodal learning frameworks. As the largest longitudinal transcriptomic resource for human respiratory viral immunization, it provides an accessible platform for reproducible AI-driven research, accelerating systems immunology and vaccine development against emerging viral threats.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
Online differentially private inference in stochastic gradient descent
Authors:
Jinhan Xie,
Enze Shi,
Bei Jiang,
Linglong Kong,
Xuming He
Abstract:
We propose a general privacy-preserving optimization-based framework for real-time environments without requiring trusted data curators. In particular, we introduce a noisy stochastic gradient descent algorithm for online statistical inference with streaming data under local differential privacy constraints. Unlike existing methods that either disregard privacy protection or require full access to…
▽ More
We propose a general privacy-preserving optimization-based framework for real-time environments without requiring trusted data curators. In particular, we introduce a noisy stochastic gradient descent algorithm for online statistical inference with streaming data under local differential privacy constraints. Unlike existing methods that either disregard privacy protection or require full access to the entire dataset, our proposed algorithm provides rigorous local privacy guarantees for individual-level data. It operates as a one-pass algorithm without re-accessing the historical data, thereby significantly reducing both time and space complexity. We also introduce online private statistical inference by conducting two construction procedures of valid private confidence intervals. We formally establish the convergence rates for the proposed estimators and present a functional central limit theorem to show the averaged solution path of these estimators weakly converges to a rescaled Brownian motion, providing a theoretical foundation for our online inference tool. Numerical simulation experiments demonstrate the finite-sample performance of our proposed procedure, underscoring its efficacy and reliability. Furthermore, we illustrate our method with an analysis of two datasets: the ride-sharing data and the US insurance data, showcasing its practical utility.
△ Less
Submitted 9 June, 2025; v1 submitted 13 May, 2025;
originally announced May 2025.
-
Testing Multivariate Conditional Independence Using Exchangeable Sampling and Sufficient Statistics
Authors:
Xiaotong Lin,
Jie Xie,
Fangqiao Tian,
Dongming Huang
Abstract:
We consider testing multivariate conditional independence between a response Y and a covariate vector X given additional variables Z. We introduce the Multivariate Sufficient Statistic Conditional Randomization Test (MS-CRT), which generates exchangeable copies of X by conditioning on sufficient statistics of P(X|Z). MS-CRT requires no modelling assumption on Y and accommodates any test statistics…
▽ More
We consider testing multivariate conditional independence between a response Y and a covariate vector X given additional variables Z. We introduce the Multivariate Sufficient Statistic Conditional Randomization Test (MS-CRT), which generates exchangeable copies of X by conditioning on sufficient statistics of P(X|Z). MS-CRT requires no modelling assumption on Y and accommodates any test statistics, including those derived from complex predictive models. It relaxes the assumptions of standard conditional randomization tests by allowing more unknown parameters in P(X|Z) than the sample size. MS-CRT avoids multiplicity corrections and effectively detects joint signals, even when individual components of X have only weak effects on Y . Our method extends to group selection with false discovery rate control. We develop efficient implementations for two important cases where P(X,Z) is either multivariate normal or belongs to a graphical model. For normal models, we establish the minimax rate optimality. For graphical models, we demonstrate the superior performance of our method compared to existing methods through comprehensive simulations and real-data examples.
△ Less
Submitted 9 April, 2025;
originally announced April 2025.
-
Online federated learning framework for classification
Authors:
Wenxing Guo,
Jinhan Xie,
Jianya Lu,
Bei jiang,
Hongsheng Dai,
Linglong Kong
Abstract:
In this paper, we develop a novel online federated learning framework for classification, designed to handle streaming data from multiple clients while ensuring data privacy and computational efficiency. Our method leverages the generalized distance-weighted discriminant technique, making it robust to both homogeneous and heterogeneous data distributions across clients. In particular, we develop a…
▽ More
In this paper, we develop a novel online federated learning framework for classification, designed to handle streaming data from multiple clients while ensuring data privacy and computational efficiency. Our method leverages the generalized distance-weighted discriminant technique, making it robust to both homogeneous and heterogeneous data distributions across clients. In particular, we develop a new optimization algorithm based on the Majorization-Minimization principle, integrated with a renewable estimation procedure, enabling efficient model updates without full retraining. We provide a theoretical guarantee for the convergence of our estimator, proving its consistency and asymptotic normality under standard regularity conditions. In addition, we establish that our method achieves Bayesian risk consistency, ensuring its reliability for classification tasks in federated environments. We further incorporate differential privacy mechanisms to enhance data security, protecting client information while maintaining model performance. Extensive numerical experiments on both simulated and real-world datasets demonstrate that our approach delivers high classification accuracy, significant computational efficiency gains, and substantial savings in data storage requirements compared to existing methods.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Feature Matching Intervention: Leveraging Observational Data for Causal Representation Learning
Authors:
Haoze Li,
Jun Xie
Abstract:
A major challenge in causal discovery from observational data is the absence of perfect interventions, making it difficult to distinguish causal features from spurious ones. We propose an innovative approach, Feature Matching Intervention (FMI), which uses a matching procedure to mimic perfect interventions. We define causal latent graphs, extending structural causal models to latent feature space…
▽ More
A major challenge in causal discovery from observational data is the absence of perfect interventions, making it difficult to distinguish causal features from spurious ones. We propose an innovative approach, Feature Matching Intervention (FMI), which uses a matching procedure to mimic perfect interventions. We define causal latent graphs, extending structural causal models to latent feature space, providing a framework that connects FMI with causal graph learning. Our feature matching procedure emulates perfect interventions within these causal latent graphs. Theoretical results demonstrate that FMI exhibits strong out-of-distribution (OOD) generalizability. Experiments further highlight FMI's superior performance in effectively identifying causal features solely from observational data.
△ Less
Submitted 5 March, 2025;
originally announced March 2025.
-
Latent Thought Models with Variational Bayes Inference-Time Computation
Authors:
Deqian Kong,
Minglu Zhao,
Dehong Xu,
Bo Pang,
Shu Wang,
Edouardo Honig,
Zhangzhang Si,
Chuan Li,
Jianwen Xie,
Sirui Xie,
Ying Nian Wu
Abstract:
We propose a novel class of language models, Latent Thought Models (LTMs), which incorporate explicit latent thought vectors that follow an explicit prior model in latent space. These latent thought vectors guide the autoregressive generation of ground tokens through a Transformer decoder. Training employs a dual-rate optimization process within the classical variational Bayes framework: fast lear…
▽ More
We propose a novel class of language models, Latent Thought Models (LTMs), which incorporate explicit latent thought vectors that follow an explicit prior model in latent space. These latent thought vectors guide the autoregressive generation of ground tokens through a Transformer decoder. Training employs a dual-rate optimization process within the classical variational Bayes framework: fast learning of local variational parameters for the posterior distribution of latent vectors (inference-time computation), and slow learning of global decoder parameters. Empirical studies reveal that LTMs possess additional scaling dimensions beyond traditional Large Language Models (LLMs), such as the number of iterations in inference-time computation and number of latent thought vectors. Higher sample efficiency can be achieved by increasing training compute per token, with further gains possible by trading model size for more inference steps. Designed based on these scaling properties, LTMs demonstrate superior sample and parameter efficiency compared to autoregressive models and discrete diffusion models. They significantly outperform these counterparts in validation perplexity and zero-shot language modeling tasks. Additionally, LTMs exhibit emergent few-shot in-context reasoning capabilities that scale with model size, and achieve competitive performance in conditional and unconditional text generation.
△ Less
Submitted 6 June, 2025; v1 submitted 3 February, 2025;
originally announced February 2025.
-
Representational Transfer Learning for Matrix Completion
Authors:
Yong He,
Zeyu Li,
Dong Liu,
Kangxiang Qin,
Jiahui Xie
Abstract:
We propose to transfer representational knowledge from multiple sources to a target noisy matrix completion task by aggregating singular subspaces information. Under our representational similarity framework, we first integrate linear representation information by solving a two-way principal component analysis problem based on a properly debiased matrix-valued dataset. After acquiring better colum…
▽ More
We propose to transfer representational knowledge from multiple sources to a target noisy matrix completion task by aggregating singular subspaces information. Under our representational similarity framework, we first integrate linear representation information by solving a two-way principal component analysis problem based on a properly debiased matrix-valued dataset. After acquiring better column and row representation estimators from the sources, the original high-dimensional target matrix completion problem is then transformed into a low-dimensional linear regression, of which the statistical efficiency is guaranteed. A variety of extensional arguments, including post-transfer statistical inference and robustness against negative transfer, are also discussed alongside. Finally, extensive simulation results and a number of real data cases are reported to support our claims.
△ Less
Submitted 9 December, 2024;
originally announced December 2024.
-
Compound Gaussian Radar Clutter Model With Positive Tempered Alpha-Stable Texture
Authors:
Xingxing Liao,
Junhao Xie,
Jie Zhou
Abstract:
The compound Gaussian (CG) family of distributions has achieved great success in modeling sea clutter. This work develops a flexible-tailed CG model to improve generality in clutter modeling, by introducing the positive tempered $α$-stable (PT$α$S) distribution to model clutter texture. The PT$α$S distribution exhibits widely tunable tails by tempering the heavy tails of the positive $α$-stable (P…
▽ More
The compound Gaussian (CG) family of distributions has achieved great success in modeling sea clutter. This work develops a flexible-tailed CG model to improve generality in clutter modeling, by introducing the positive tempered $α$-stable (PT$α$S) distribution to model clutter texture. The PT$α$S distribution exhibits widely tunable tails by tempering the heavy tails of the positive $α$-stable (P$α$S) distribution, thus providing greater flexibility in texture modeling. Specifically, we first develop a bivariate isotropic CG-PT$α$S complex clutter model that is defined by an explicit characteristic function, based on which the corresponding amplitude model is derived. Then, we prove that the amplitude model can be expressed as a scale mixture of Rayleighs, just as the successful compound K and Pareto models. Furthermore, a characteristic function-based method is developed to estimate the parameters of the amplitude model. Finally, real-world sea clutter data analysis indicates the amplitude model's flexibility in modeling clutter data with various tail behaviors.
△ Less
Submitted 6 December, 2024;
originally announced December 2024.
-
Latent Space Energy-based Neural ODEs
Authors:
Sheng Cheng,
Deqian Kong,
Jianwen Xie,
Kookjin Lee,
Ying Nian Wu,
Yezhou Yang
Abstract:
This paper introduces novel deep dynamical models designed to represent continuous-time sequences. Our approach employs a neural emission model to generate each data point in the time series through a non-linear transformation of a latent state vector. The evolution of these latent states is implicitly defined by a neural ordinary differential equation (ODE), with the initial state drawn from an i…
▽ More
This paper introduces novel deep dynamical models designed to represent continuous-time sequences. Our approach employs a neural emission model to generate each data point in the time series through a non-linear transformation of a latent state vector. The evolution of these latent states is implicitly defined by a neural ordinary differential equation (ODE), with the initial state drawn from an informative prior distribution parameterized by an Energy-based model (EBM). This framework is extended to disentangle dynamic states from underlying static factors of variation, represented as time-invariant variables in the latent space. We train the model using maximum likelihood estimation with Markov chain Monte Carlo (MCMC) in an end-to-end manner. Experimental results on oscillating systems, videos and real-world state sequences (MuJoCo) demonstrate that our model with the learnable energy-based prior outperforms existing counterparts, and can generalize to new dynamic parameterization, enabling long-horizon predictions.
△ Less
Submitted 5 February, 2025; v1 submitted 5 September, 2024;
originally announced September 2024.
-
DART2: a robust multiple testing method to smartly leverage helpful or misleading ancillary information
Authors:
Xuechan Li,
Jichun Xie
Abstract:
In many applications of multiple testing, ancillary information is available, reflecting the hypothesis null or alternative status. Several methods have been developed to leverage this ancillary information to enhance testing power, typically requiring the ancillary information is helpful enough to ensure favorable performance. In this paper, we develop a robust and effective distance-assisted mul…
▽ More
In many applications of multiple testing, ancillary information is available, reflecting the hypothesis null or alternative status. Several methods have been developed to leverage this ancillary information to enhance testing power, typically requiring the ancillary information is helpful enough to ensure favorable performance. In this paper, we develop a robust and effective distance-assisted multiple testing procedure named DART2, designed to be powerful and robust regardless of the quality of ancillary information. When the ancillary information is helpful, DART2 can asymptotically control FDR while improving power; otherwise, DART2 can still control FDR and maintain power at least as high as ignoring the ancillary information. We demonstrated DART2's superior performance compared to existing methods through numerical studies under various settings. In addition, DART2 has been applied to a gene association study where we have shown its superior accuracy and robustness under two different types of ancillary information.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
Latent Energy-Based Odyssey: Black-Box Optimization via Expanded Exploration in the Energy-Based Latent Space
Authors:
Peiyu Yu,
Dinghuai Zhang,
Hengzhi He,
Xiaojian Ma,
Ruiyao Miao,
Yifan Lu,
Yasi Zhang,
Deqian Kong,
Ruiqi Gao,
Jianwen Xie,
Guang Cheng,
Ying Nian Wu
Abstract:
Offline Black-Box Optimization (BBO) aims at optimizing a black-box function using the knowledge from a pre-collected offline dataset of function values and corresponding input designs. However, the high-dimensional and highly-multimodal input design space of black-box function pose inherent challenges for most existing methods that model and operate directly upon input designs. These issues inclu…
▽ More
Offline Black-Box Optimization (BBO) aims at optimizing a black-box function using the knowledge from a pre-collected offline dataset of function values and corresponding input designs. However, the high-dimensional and highly-multimodal input design space of black-box function pose inherent challenges for most existing methods that model and operate directly upon input designs. These issues include but are not limited to high sample complexity, which relates to inaccurate approximation of black-box function; and insufficient coverage and exploration of input design modes, which leads to suboptimal proposal of new input designs. In this work, we consider finding a latent space that serves as a compressed yet accurate representation of the design-value joint space, enabling effective latent exploration of high-value input design modes. To this end, we formulate an learnable energy-based latent space, and propose Noise-intensified Telescoping density-Ratio Estimation (NTRE) scheme for variational learning of an accurate latent space model without costly Markov Chain Monte Carlo. The optimization process is then exploration of high-value designs guided by the learned energy-based model in the latent space, formulated as gradient-based sampling from a latent-variable-parameterized inverse model. We show that our particular parameterization encourages expanded exploration around high-value design modes, motivated by inversion thinking of a fundamental result of conditional covariance matrix typically used for variance reduction. We observe that our method, backed by an accurately learned informative latent space and an expanding-exploration model design, yields significant improvements over strong previous methods on both synthetic and real world datasets such as the design-bench suite.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
STANLEY: Stochastic Gradient Anisotropic Langevin Dynamics for Learning Energy-Based Models
Authors:
Belhal Karimi,
Jianwen Xie,
Ping Li
Abstract:
We propose in this paper, STANLEY, a STochastic gradient ANisotropic LangEvin dYnamics, for sampling high dimensional data. With the growing efficacy and potential of Energy-Based modeling, also known as non-normalized probabilistic modeling, for modeling a generative process of different natures of high dimensional data observations, we present an end-to-end learning algorithm for Energy-Based mo…
▽ More
We propose in this paper, STANLEY, a STochastic gradient ANisotropic LangEvin dYnamics, for sampling high dimensional data. With the growing efficacy and potential of Energy-Based modeling, also known as non-normalized probabilistic modeling, for modeling a generative process of different natures of high dimensional data observations, we present an end-to-end learning algorithm for Energy-Based models (EBM) with the purpose of improving the quality of the resulting sampled data points. While the unknown normalizing constant of EBMs makes the training procedure intractable, resorting to Markov Chain Monte Carlo (MCMC) is in general a viable option. Realizing what MCMC entails for the EBM training, we propose in this paper, a novel high dimensional sampling method, based on an anisotropic stepsize and a gradient-informed covariance matrix, embedded into a discretized Langevin diffusion. We motivate the necessity for an anisotropic update of the negative samples in the Markov Chain by the nonlinearity of the backbone of the EBM, here a Convolutional Neural Network. Our resulting method, namely STANLEY, is an optimization algorithm for training Energy-Based models via our newly introduced MCMC method. We provide a theoretical understanding of our sampling scheme by proving that the sampler leads to a geometrically uniformly ergodic Markov Chain. Several image generation experiments are provided in our paper to show the effectiveness of our method.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Molecule Design by Latent Prompt Transformer
Authors:
Deqian Kong,
Yuhao Huang,
Jianwen Xie,
Ying Nian Wu
Abstract:
This paper proposes a latent prompt Transformer model for solving challenging optimization problems such as molecule design, where the goal is to find molecules with optimal values of a target chemical or biological property that can be computed by an existing software. Our proposed model consists of three components. (1) A latent vector whose prior distribution is modeled by a Unet transformation…
▽ More
This paper proposes a latent prompt Transformer model for solving challenging optimization problems such as molecule design, where the goal is to find molecules with optimal values of a target chemical or biological property that can be computed by an existing software. Our proposed model consists of three components. (1) A latent vector whose prior distribution is modeled by a Unet transformation of a Gaussian white noise vector. (2) A molecule generation model that generates the string-based representation of molecule conditional on the latent vector in (1). We adopt the causal Transformer model that takes the latent vector in (1) as prompt. (3) A property prediction model that predicts the value of the target property of a molecule based on a non-linear regression on the latent vector in (1). We call the proposed model the latent prompt Transformer model. After initial training of the model on existing molecules and their property values, we then gradually shift the model distribution towards the region that supports desired values of the target property for the purpose of molecule design. Our experiments show that our proposed model achieves state of the art performances on several benchmark molecule design tasks.
△ Less
Submitted 5 February, 2024; v1 submitted 4 October, 2023;
originally announced October 2023.
-
Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood
Authors:
Yaxuan Zhu,
Jianwen Xie,
Yingnian Wu,
Ruiqi Gao
Abstract:
Training energy-based models (EBMs) on high-dimensional data can be both challenging and time-consuming, and there exists a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models. To close this gap, inspired by the recent efforts of learning EBMs by maximizing diffusion recovery likelihood (DRL), we propose cooperative diffusion recovery likeli…
▽ More
Training energy-based models (EBMs) on high-dimensional data can be both challenging and time-consuming, and there exists a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models. To close this gap, inspired by the recent efforts of learning EBMs by maximizing diffusion recovery likelihood (DRL), we propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs defined on increasingly noisy versions of a dataset, paired with an initializer model for each EBM. At each noise level, the two models are jointly estimated within a cooperative training framework: samples from the initializer serve as starting points that are refined by a few MCMC sampling steps from the EBM. The EBM is then optimized by maximizing recovery likelihood, while the initializer model is optimized by learning from the difference between the refined samples and the initial samples. In addition, we made several practical designs for EBM training to further improve the sample quality. Combining these advances, our approach significantly boost the generation performance compared to existing EBM methods on CIFAR-10 and ImageNet datasets. We also demonstrate the effectiveness of our models for several downstream tasks, including classifier-free guided generation, compositional generation, image inpainting and out-of-distribution detection.
△ Less
Submitted 10 November, 2024; v1 submitted 10 September, 2023;
originally announced September 2023.
-
Likelihood-Based Generative Radiance Field with Latent Space Energy-Based Model for 3D-Aware Disentangled Image Representation
Authors:
Yaxuan Zhu,
Jianwen Xie,
Ping Li
Abstract:
We propose the NeRF-LEBM, a likelihood-based top-down 3D-aware 2D image generative model that incorporates 3D representation via Neural Radiance Fields (NeRF) and 2D imaging process via differentiable volume rendering. The model represents an image as a rendering process from 3D object to 2D image and is conditioned on some latent variables that account for object characteristics and are assumed t…
▽ More
We propose the NeRF-LEBM, a likelihood-based top-down 3D-aware 2D image generative model that incorporates 3D representation via Neural Radiance Fields (NeRF) and 2D imaging process via differentiable volume rendering. The model represents an image as a rendering process from 3D object to 2D image and is conditioned on some latent variables that account for object characteristics and are assumed to follow informative trainable energy-based prior models. We propose two likelihood-based learning frameworks to train the NeRF-LEBM: (i) maximum likelihood estimation with Markov chain Monte Carlo-based inference and (ii) variational inference with the reparameterization trick. We study our models in the scenarios with both known and unknown camera poses. Experiments on several benchmark datasets demonstrate that the NeRF-LEBM can infer 3D object structures from 2D images, generate 2D images with novel views and objects, learn from incomplete 2D images, and learn from 2D images with known or unknown camera poses.
△ Less
Submitted 16 April, 2023;
originally announced April 2023.
-
Depression Detection Using Digital Traces on Social Media: A Knowledge-aware Deep Learning Approach
Authors:
Wenli Zhang,
Jiaheng Xie,
Zhu Zhang,
Xiang Liu
Abstract:
Depression is a common disease worldwide. It is difficult to diagnose and continues to be underdiagnosed. Because depressed patients constantly share their symptoms, major life events, and treatments on social media, researchers are turning to user-generated digital traces on social media for depression detection. Such methods have distinct advantages in combating depression because they can facil…
▽ More
Depression is a common disease worldwide. It is difficult to diagnose and continues to be underdiagnosed. Because depressed patients constantly share their symptoms, major life events, and treatments on social media, researchers are turning to user-generated digital traces on social media for depression detection. Such methods have distinct advantages in combating depression because they can facilitate innovative approaches to fight depression and alleviate its social and economic burden. However, most existing studies lack effective means to incorporate established medical domain knowledge in depression detection or suffer from feature extraction difficulties that impede greater performance. Following the design science research paradigm, we propose a Deep Knowledge-aware Depression Detection (DKDD) framework to accurately detect social media users at risk of depression and explain the critical factors that contribute to such detection. Extensive empirical studies with real-world data demonstrate that, by incorporating domain knowledge, our method outperforms existing state-of-the-art methods. Our work has significant implications for IS research in knowledge-aware machine learning, digital traces utilization, and NLP research in IS. Practically, by providing early detection and explaining the critical factors, DKDD can supplement clinical depression screening and enable large-scale evaluations of a population's mental health status.
△ Less
Submitted 1 August, 2023; v1 submitted 6 March, 2023;
originally announced March 2023.
-
Extreme eigenvalues of sample covariance matrices under generalized elliptical models with applications
Authors:
Xiucai Ding,
Jiahui Xie,
Long Yu,
Wang Zhou
Abstract:
We consider the extreme eigenvalues of the sample covariance matrix $Q=YY^*$ under the generalized elliptical model that $Y=Σ^{1/2}XD.$ Here $Σ$ is a bounded $p \times p$ positive definite deterministic matrix representing the population covariance structure, $X$ is a $p \times n$ random matrix containing either independent columns sampled from the unit sphere in $\mathbb{R}^p$ or i.i.d. centered…
▽ More
We consider the extreme eigenvalues of the sample covariance matrix $Q=YY^*$ under the generalized elliptical model that $Y=Σ^{1/2}XD.$ Here $Σ$ is a bounded $p \times p$ positive definite deterministic matrix representing the population covariance structure, $X$ is a $p \times n$ random matrix containing either independent columns sampled from the unit sphere in $\mathbb{R}^p$ or i.i.d. centered entries with variance $n^{-1},$ and $D$ is a diagonal random matrix containing i.i.d. entries and independent of $X.$ Such a model finds important applications in statistics and machine learning.
In this paper, assuming that $p$ and $n$ are comparably large, we prove that the extreme edge eigenvalues of $Q$ can have several types of distributions depending on $Σ$ and $D$ asymptotically. These distributions include: Gumbel, Fréchet, Weibull, Tracy-Widom, Gaussian and their mixtures. On the one hand, when the random variables in $D$ have unbounded support, the edge eigenvalues of $Q$ can have either Gumbel or Fréchet distribution depending on the tail decay property of $D.$ On the other hand, when the random variables in $D$ have bounded support, under some mild regularity assumptions on $Σ,$ the edge eigenvalues of $Q$ can exhibit Weibull, Tracy-Widom, Gaussian or their mixtures. Based on our theoretical results, we consider two important applications. First, we propose some statistics and procedure to detect and estimate the possible spikes for elliptically distributed data. Second, in the context of a factor model, by using the multiplier bootstrap procedure via selecting the weights in $D,$ we propose a new algorithm to infer and estimate the number of factors in the factor model. Numerical simulations also confirm the accuracy and powerfulness of our proposed methods and illustrate better performance compared to some existing methods in the literature.
△ Less
Submitted 19 April, 2023; v1 submitted 6 March, 2023;
originally announced March 2023.
-
Scalable inference in functional linear regression with streaming data
Authors:
Jinhan Xie,
Enze Shi,
Peijun Sang,
Zuofeng Shang,
Bei Jiang,
Linglong Kong
Abstract:
Traditional static functional data analysis is facing new challenges due to streaming data, where data constantly flow in. A major challenge is that storing such an ever-increasing amount of data in memory is nearly impossible. In addition, existing inferential tools in online learning are mainly developed for finite-dimensional problems, while inference methods for functional data are focused on…
▽ More
Traditional static functional data analysis is facing new challenges due to streaming data, where data constantly flow in. A major challenge is that storing such an ever-increasing amount of data in memory is nearly impossible. In addition, existing inferential tools in online learning are mainly developed for finite-dimensional problems, while inference methods for functional data are focused on the batch learning setting. In this paper, we tackle these issues by developing functional stochastic gradient descent algorithms and proposing an online bootstrap resampling procedure to systematically study the inference problem for functional linear regression. In particular, the proposed estimation and inference procedures use only one pass over the data; thus they are easy to implement and suitable to the situation where data arrive in a streaming manner. Furthermore, we establish the convergence rate as well as the asymptotic distribution of the proposed estimator. Meanwhile, the proposed perturbed estimator from the bootstrap procedure is shown to enjoy the same theoretical properties, which provide the theoretical justification for our online inference tool. As far as we know, this is the first inference result on the functional linear regression model with streaming data. Simulation studies are conducted to investigate the finite-sample performance of the proposed procedure. An application is illustrated with the Beijing multi-site air-quality data.
△ Less
Submitted 10 October, 2023; v1 submitted 5 February, 2023;
originally announced February 2023.
-
A Tale of Two Latent Flows: Learning Latent Space Normalizing Flow with Short-run Langevin Flow for Approximate Inference
Authors:
Jianwen Xie,
Yaxuan Zhu,
Yifei Xu,
Dingcheng Li,
Ping Li
Abstract:
We study a normalizing flow in the latent space of a top-down generator model, in which the normalizing flow model plays the role of the informative prior model of the generator. We propose to jointly learn the latent space normalizing flow prior model and the top-down generator model by a Markov chain Monte Carlo (MCMC)-based maximum likelihood algorithm, where a short-run Langevin sampling from…
▽ More
We study a normalizing flow in the latent space of a top-down generator model, in which the normalizing flow model plays the role of the informative prior model of the generator. We propose to jointly learn the latent space normalizing flow prior model and the top-down generator model by a Markov chain Monte Carlo (MCMC)-based maximum likelihood algorithm, where a short-run Langevin sampling from the intractable posterior distribution is performed to infer the latent variables for each observed example, so that the parameters of the normalizing flow prior and the generator can be updated with the inferred latent variables. We show that, under the scenario of non-convergent short-run MCMC, the finite step Langevin dynamics is a flow-like approximate inference model and the learning objective actually follows the perturbation of the maximum likelihood estimation (MLE). We further point out that the learning framework seeks to (i) match the latent space normalizing flow and the aggregated posterior produced by the short-run Langevin flow, and (ii) bias the model from MLE such that the short-run Langevin flow inference is close to the true posterior. Empirical results of extensive experiments validate the effectiveness of the proposed latent space normalizing flow model in the tasks of image generation, image reconstruction, anomaly detection, supervised image inpainting and unsupervised image recovery.
△ Less
Submitted 23 January, 2023;
originally announced January 2023.
-
How We Express Ourselves Freely: Censorship, Self-censorship, and Anti-censorship on a Chinese Social Media
Authors:
Xiang Chen,
Jiamu Xie,
Zixin Wang,
Bohui Shen,
Zhixuan Zhou
Abstract:
Censorship, anti-censorship, and self-censorship in an authoritarian regime have been extensively studies, yet the relationship between these intertwined factors is not well understood. In this paper, we report results of a large-scale survey study (N = 526) with Sina Weibo users toward bridging this research gap. Through descriptive statistics, correlation analysis, and regression analysis, we un…
▽ More
Censorship, anti-censorship, and self-censorship in an authoritarian regime have been extensively studies, yet the relationship between these intertwined factors is not well understood. In this paper, we report results of a large-scale survey study (N = 526) with Sina Weibo users toward bridging this research gap. Through descriptive statistics, correlation analysis, and regression analysis, we uncover how users are being censored, how and why they conduct self-censorship on different topics and in different scenarios (i.e., post, repost, and comment), and their various anti-censorship strategies. We further identify the metrics of censorship and self-censorship, find the influence factors, and construct a mediation model to measure their relationship. Based on these findings, we discuss implications for democratic social media design and future censorship research.
△ Less
Submitted 24 November, 2022;
originally announced November 2022.
-
Estimating heterogeneous treatment effects versus building individualized treatment rules: Connection and disconnection
Authors:
Zhongyuan Chen,
Jun Xie
Abstract:
Estimating heterogeneous treatment effects is a well-studied topic in the statistics literature. More recently, it has regained attention due to an increasing need for precision medicine as well as the increased use of state-of-art machine learning methods in the estimation. Furthermore, estimating heterogeneous treatment effects is directly related to building an individualized treatment rule, wh…
▽ More
Estimating heterogeneous treatment effects is a well-studied topic in the statistics literature. More recently, it has regained attention due to an increasing need for precision medicine as well as the increased use of state-of-art machine learning methods in the estimation. Furthermore, estimating heterogeneous treatment effects is directly related to building an individualized treatment rule, which is a decision rule of treatment according to patient characteristics. This paper examines the connection and disconnection between these two research problems. Notably, a better estimation of the heterogeneous treatment effects may or may not lead to a better individualized treatment rule. We provide theoretical frameworks to explain the connection and disconnection and demonstrate two different scenarios through simulations. Our conclusion sheds light on a practical guide that under certain circumstances, there is no need to enhance estimation of the treatment effects, as it does not alter the treatment decision.
△ Less
Submitted 3 October, 2022;
originally announced October 2022.
-
Inference on High-dimensional Single-index Models with Streaming Data
Authors:
Dongxiao Han,
Jinhan Xie,
Jin Liu,
Liuquan Sun,
Jian Huang,
Bei Jian,
Linglong Kong
Abstract:
Traditional statistical methods are faced with new challenges due to streaming data. The major challenge is the rapidly growing volume and velocity of data, which makes storing such huge datasets in memory impossible. The paper presents an online inference framework for regression parameters in high-dimensional semiparametric single-index models with unknown link functions. The proposed online pro…
▽ More
Traditional statistical methods are faced with new challenges due to streaming data. The major challenge is the rapidly growing volume and velocity of data, which makes storing such huge datasets in memory impossible. The paper presents an online inference framework for regression parameters in high-dimensional semiparametric single-index models with unknown link functions. The proposed online procedure updates only the current data batch and summary statistics of historical data instead of re-accessing the entire raw data set. At the same time, we do not need to estimate the unknown link function, which is a highly challenging task. In addition, a generalized convex loss function is used in the proposed inference procedure. To illustrate the proposed method, we use the Huber loss function and the logistic regression model's negative log-likelihood. In this study, the asymptotic normality of the proposed online debiased Lasso estimators and the bounds of the proposed online Lasso estimators are investigated. To evaluate the performance of the proposed method, extensive simulation studies have been conducted. We provide applications to Nasdaq stock prices and financial distress datasets.
△ Less
Submitted 3 October, 2022;
originally announced October 2022.
-
A Tale of Two Flows: Cooperative Learning of Langevin Flow and Normalizing Flow Toward Energy-Based Model
Authors:
Jianwen Xie,
Yaxuan Zhu,
Jun Li,
Ping Li
Abstract:
This paper studies the cooperative learning of two generative flow models, in which the two models are iteratively updated based on the jointly synthesized examples. The first flow model is a normalizing flow that transforms an initial simple density to a target density by applying a sequence of invertible transformations. The second flow model is a Langevin flow that runs finite steps of gradient…
▽ More
This paper studies the cooperative learning of two generative flow models, in which the two models are iteratively updated based on the jointly synthesized examples. The first flow model is a normalizing flow that transforms an initial simple density to a target density by applying a sequence of invertible transformations. The second flow model is a Langevin flow that runs finite steps of gradient-based MCMC toward an energy-based model. We start from proposing a generative framework that trains an energy-based model with a normalizing flow as an amortized sampler to initialize the MCMC chains of the energy-based model. In each learning iteration, we generate synthesized examples by using a normalizing flow initialization followed by a short-run Langevin flow revision toward the current energy-based model. Then we treat the synthesized examples as fair samples from the energy-based model and update the model parameters with the maximum likelihood learning gradient, while the normalizing flow directly learns from the synthesized examples by maximizing the tractable likelihood. Under the short-run non-mixing MCMC scenario, the estimation of the energy-based model is shown to follow the perturbation of maximum likelihood, and the short-run Langevin flow and the normalizing flow form a two-flow generator that we call CoopFlow. We provide an understating of the CoopFlow algorithm by information geometry and show that it is a valid generator as it converges to a moment matching estimator. We demonstrate that the trained CoopFlow is capable of synthesizing realistic images, reconstructing images, and interpolating between images.
△ Less
Submitted 13 May, 2022;
originally announced May 2022.
-
Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification
Authors:
Jiangtao Xie,
Fei Long,
Jiaming Lv,
Qilong Wang,
Peihua Li
Abstract:
Few-shot classification is a challenging problem as only very few training examples are given for each new task. One of the effective research lines to address this challenge focuses on learning deep representations driven by a similarity measure between a query image and few support images of some class. Statistically, this amounts to measure the dependency of image features, viewed as random vec…
▽ More
Few-shot classification is a challenging problem as only very few training examples are given for each new task. One of the effective research lines to address this challenge focuses on learning deep representations driven by a similarity measure between a query image and few support images of some class. Statistically, this amounts to measure the dependency of image features, viewed as random vectors in a high-dimensional embedding space. Previous methods either only use marginal distributions without considering joint distributions, suffering from limited representation capability, or are computationally expensive though harnessing joint distributions. In this paper, we propose a deep Brownian Distance Covariance (DeepBDC) method for few-shot classification. The central idea of DeepBDC is to learn image representations by measuring the discrepancy between joint characteristic functions of embedded features and product of the marginals. As the BDC metric is decoupled, we formulate it as a highly modular and efficient layer. Furthermore, we instantiate DeepBDC in two different few-shot classification frameworks. We make experiments on six standard few-shot image benchmarks, covering general object recognition, fine-grained categorization and cross-domain classification. Extensive evaluations show our DeepBDC significantly outperforms the counterparts, while establishing new state-of-the-art results. The source code is available at http://www.peihuali.org/DeepBDC
△ Less
Submitted 9 April, 2022;
originally announced April 2022.
-
A New Multivariate Predictive Model for Stock Returns
Authors:
Jianying Xie
Abstract:
One of the most important studies in finance is to find out whether stock returns could be predicted. This research aims to create a new multivariate model, which includes dividend yield, earnings-to-price ratio, book-to-market ratio as well as consumption-wealth ratio as explanatory variables, for future stock returns predictions. The new multivariate model will be assessed for its forecasting pe…
▽ More
One of the most important studies in finance is to find out whether stock returns could be predicted. This research aims to create a new multivariate model, which includes dividend yield, earnings-to-price ratio, book-to-market ratio as well as consumption-wealth ratio as explanatory variables, for future stock returns predictions. The new multivariate model will be assessed for its forecasting performance using empirical analysis. The empirical analysis is performed on S&P500 quarterly data from Quarter 1, 1952 to Quarter 4, 2019 as well as S&P500 monthly data from Month 12, 1920 to Month 12, 2019. Results have shown this new multivariate model has predictability for future stock returns. When compared to other benchmark models, the new multivariate model performs the best in terms of the Root Mean Squared Error (RMSE) most of the time.
△ Less
Submitted 5 October, 2021;
originally announced October 2021.
-
Data-guided Treatment Recommendation with Feature Scores
Authors:
Zhongyuan Chen,
Ziyi Wang,
Qifan Song,
Jun Xie
Abstract:
Despite the availability of large amounts of genomics data, medical treatment recommendations have not successfully used them. In this paper, we consider the utility of high dimensional genomic-clinical data and nonparametric methods for making cancer treatment recommendations. This builds upon the framework of the individualized treatment rule [Qian and Murphy 2011] but we aim to overcome their m…
▽ More
Despite the availability of large amounts of genomics data, medical treatment recommendations have not successfully used them. In this paper, we consider the utility of high dimensional genomic-clinical data and nonparametric methods for making cancer treatment recommendations. This builds upon the framework of the individualized treatment rule [Qian and Murphy 2011] but we aim to overcome their method's limitations, specifically in the instances when the method encounters a large number of covariates and an issue of model misspecification. We tackle this problem using a dimension reduction method, namely Sliced Inverse Regression (SIR, [Li 1991]), with a rich class of models for the treatment response. Notably, SIR defines a feature space for high-dimensional data, offering an advantage similar to those found in the popular neural network models. With the features obtained from SIR, a simple visualization is used to compare different treatment options and present the recommended treatment. Additionally, we derive the consistency and the convergence rate of the proposed recommendation approach through a value function. The effectiveness of the proposed approach is demonstrated through simulation studies and the promising results from a real-data example of the treatment of multiple myeloma.
△ Less
Submitted 19 February, 2022; v1 submitted 9 August, 2021;
originally announced August 2021.
-
Distance Assisted Recursive Testing
Authors:
Xuechan Li,
Anthony Sung,
Jichun Xie
Abstract:
In many applications, a large number of features are collected with the goal to identify a few important ones. Sometimes, these features lie in a metric space with a known distance matrix, which partially reflects their co-importance pattern. Proper use of the distance matrix will boost the power of identifying important features. Hence, we develop a new multiple testing framework named the Distan…
▽ More
In many applications, a large number of features are collected with the goal to identify a few important ones. Sometimes, these features lie in a metric space with a known distance matrix, which partially reflects their co-importance pattern. Proper use of the distance matrix will boost the power of identifying important features. Hence, we develop a new multiple testing framework named the Distance Assisted Recursive Testing (DART). DART has two stages. In stage 1, we transform the distance matrix into an aggregation tree, where each node represents a set of features. In stage 2, based on the aggregation tree, we set up dynamic node hypotheses and perform multiple testing on the tree. All rejections are mapped back to the features. Under mild assumptions, the false discovery proportion of DART converges to the desired level in high probability converging to one. We illustrate by theory and simulations that DART has superior performance under various models compared to the existing methods. We applied DART to a clinical trial in the allogeneic stem cell transplantation study to identify the gut microbiota whose abundance will be impacted by the after-transplant care.
△ Less
Submitted 24 September, 2021; v1 submitted 19 March, 2021;
originally announced March 2021.
-
Learning Energy-Based Model with Variational Auto-Encoder as Amortized Sampler
Authors:
Jianwen Xie,
Zilong Zheng,
Ping Li
Abstract:
Due to the intractable partition function, training energy-based models (EBMs) by maximum likelihood requires Markov chain Monte Carlo (MCMC) sampling to approximate the gradient of the Kullback-Leibler divergence between data and model distributions. However, it is non-trivial to sample from an EBM because of the difficulty of mixing between modes. In this paper, we propose to learn a variational…
▽ More
Due to the intractable partition function, training energy-based models (EBMs) by maximum likelihood requires Markov chain Monte Carlo (MCMC) sampling to approximate the gradient of the Kullback-Leibler divergence between data and model distributions. However, it is non-trivial to sample from an EBM because of the difficulty of mixing between modes. In this paper, we propose to learn a variational auto-encoder (VAE) to initialize the finite-step MCMC, such as Langevin dynamics that is derived from the energy function, for efficient amortized sampling of the EBM. With these amortized MCMC samples, the EBM can be trained by maximum likelihood, which follows an "analysis by synthesis" scheme; while the VAE learns from these MCMC samples via variational Bayes. We call this joint training algorithm the variational MCMC teaching, in which the VAE chases the EBM toward data distribution. We interpret the learning algorithm as a dynamic alternating projection in the context of information geometry. Our proposed models can generate samples comparable to GANs and EBMs. Additionally, we demonstrate that our model can learn effective probabilistic distribution toward supervised conditional learning tasks.
△ Less
Submitted 23 December, 2021; v1 submitted 29 December, 2020;
originally announced December 2020.
-
DS-UI: Dual-Supervised Mixture of Gaussian Mixture Models for Uncertainty Inference
Authors:
Jiyang Xie,
Zhanyu Ma,
Jing-Hao Xue,
Guoqiang Zhang,
Jun Guo
Abstract:
This paper proposes a dual-supervised uncertainty inference (DS-UI) framework for improving Bayesian estimation-based uncertainty inference (UI) in deep neural network (DNN)-based image recognition. In the DS-UI, we combine the classifier of a DNN, i.e., the last fully-connected (FC) layer, with a mixture of Gaussian mixture models (MoGMM) to obtain an MoGMM-FC layer. Unlike existing UI methods fo…
▽ More
This paper proposes a dual-supervised uncertainty inference (DS-UI) framework for improving Bayesian estimation-based uncertainty inference (UI) in deep neural network (DNN)-based image recognition. In the DS-UI, we combine the classifier of a DNN, i.e., the last fully-connected (FC) layer, with a mixture of Gaussian mixture models (MoGMM) to obtain an MoGMM-FC layer. Unlike existing UI methods for DNNs, which only calculate the means or modes of the DNN outputs' distributions, the proposed MoGMM-FC layer acts as a probabilistic interpreter for the features that are inputs of the classifier to directly calculate the probability density of them for the DS-UI. In addition, we propose a dual-supervised stochastic gradient-based variational Bayes (DS-SGVB) algorithm for the MoGMM-FC layer optimization. Unlike conventional SGVB and optimization algorithms in other UI methods, the DS-SGVB not only models the samples in the specific class for each Gaussian mixture model (GMM) in the MoGMM, but also considers the negative samples from other classes for the GMM to reduce the intra-class distances and enlarge the inter-class margins simultaneously for enhancing the learning ability of the MoGMM-FC layer in the DS-UI. Experimental results show the DS-UI outperforms the state-of-the-art UI methods in misclassification detection. We further evaluate the DS-UI in open-set out-of-domain/-distribution detection and find statistically significant improvements. Visualizations of the feature spaces demonstrate the superiority of the DS-UI.
△ Less
Submitted 17 November, 2020;
originally announced November 2020.
-
Advanced Dropout: A Model-free Methodology for Bayesian Dropout Optimization
Authors:
Jiyang Xie,
Zhanyu Ma,
and Jianjun Lei,
Guoqiang Zhang,
Jing-Hao Xue,
Zheng-Hua Tan,
Jun Guo
Abstract:
Due to lack of data, overfitting ubiquitously exists in real-world applications of deep neural networks (DNNs). We propose advanced dropout, a model-free methodology, to mitigate overfitting and improve the performance of DNNs. The advanced dropout technique applies a model-free and easily implemented distribution with parametric prior, and adaptively adjusts dropout rate. Specifically, the distri…
▽ More
Due to lack of data, overfitting ubiquitously exists in real-world applications of deep neural networks (DNNs). We propose advanced dropout, a model-free methodology, to mitigate overfitting and improve the performance of DNNs. The advanced dropout technique applies a model-free and easily implemented distribution with parametric prior, and adaptively adjusts dropout rate. Specifically, the distribution parameters are optimized by stochastic gradient variational Bayes in order to carry out an end-to-end training. We evaluate the effectiveness of the advanced dropout against nine dropout techniques on seven computer vision datasets (five small-scale datasets and two large-scale datasets) with various base models. The advanced dropout outperforms all the referred techniques on all the datasets.We further compare the effectiveness ratios and find that advanced dropout achieves the highest one on most cases. Next, we conduct a set of analysis of dropout rate characteristics, including convergence of the adaptive dropout rate, the learned distributions of dropout masks, and a comparison with dropout rate generation without an explicit distribution. In addition, the ability of overfitting prevention is evaluated and confirmed. Finally, we extend the application of the advanced dropout to uncertainty inference, network pruning, text classification, and regression. The proposed advanced dropout is also superior to the corresponding referred methods. Codes are available at https://github.com/PRIS-CV/AdvancedDropout.
△ Less
Submitted 10 August, 2021; v1 submitted 11 October, 2020;
originally announced October 2020.
-
Variational Autoencoder for Anti-Cancer Drug Response Prediction
Authors:
Hongyuan Dong,
Jiaqing Xie,
Zhi Jing,
Dexin Ren
Abstract:
Cancer is a primary cause of human death, but discovering drugs and tailoring cancer therapies are expensive and time-consuming. We seek to facilitate the discovery of new drugs and treatment strategies for cancer using variational autoencoders (VAEs) and multi-layer perceptrons (MLPs) to predict anti-cancer drug responses. Our model takes as input gene expression data of cancer cell lines and ant…
▽ More
Cancer is a primary cause of human death, but discovering drugs and tailoring cancer therapies are expensive and time-consuming. We seek to facilitate the discovery of new drugs and treatment strategies for cancer using variational autoencoders (VAEs) and multi-layer perceptrons (MLPs) to predict anti-cancer drug responses. Our model takes as input gene expression data of cancer cell lines and anti-cancer drug molecular data and encodes these data with our {\sc {GeneVae}} model, which is an ordinary VAE model, and a rectified junction tree variational autoencoder ({\sc JTVae}) model, respectively. A multi-layer perceptron processes these encoded features to produce a final prediction. Our tests show our system attains a high average coefficient of determination ($R^{2} = 0.83$) in predicting drug responses for breast cancer cell lines and an average $R^{2} = 0.845$ for pan-cancer cell lines. Additionally, we show that our model can generates effective drug compounds not previously used for specific cancer cell lines.
△ Less
Submitted 15 April, 2021; v1 submitted 22 August, 2020;
originally announced August 2020.
-
Dynamic Bidding Strategies with Multivariate Feedback Control for Multiple Goals in Display Advertising
Authors:
Michael Tashman,
Jiayi Xie,
John Hoffman,
Lee Winikor,
Rouzbeh Gerami
Abstract:
Real-Time Bidding (RTB) display advertising is a method for purchasing display advertising inventory in auctions that occur within milliseconds. The performance of RTB campaigns is generally measured with a series of Key Performance Indicators (KPIs) - measurements used to ensure that the campaign is cost-effective and that it is purchasing valuable inventory. While an RTB campaign should ideally…
▽ More
Real-Time Bidding (RTB) display advertising is a method for purchasing display advertising inventory in auctions that occur within milliseconds. The performance of RTB campaigns is generally measured with a series of Key Performance Indicators (KPIs) - measurements used to ensure that the campaign is cost-effective and that it is purchasing valuable inventory. While an RTB campaign should ideally meet all KPIs, simultaneous improvement tends to be very challenging, as an improvement to any one KPI risks a detrimental effect toward the others. Here we present an approach to simultaneously controlling multiple KPIs with a PID-based feedback-control system. This method generates a control score for each KPI, based on both the output of a PID controller module and a metric that quantifies the importance of each KPI for internal business needs. On regular intervals, this algorithm - Sequential Control - will choose the KPI with the greatest overall need for improvement. In this way, our algorithm is able to continually seek the greatest marginal improvements to its current state. Multiple methods of control can be associated with each KPI, and can be triggered either simultaneously or chosen stochastically, in order to avoid local optima. In both offline ad bidding simulations and testing on live traffic, our methods proved to be effective in simultaneously controlling multiple KPIs, and bringing them toward their respective goals.
△ Less
Submitted 1 June, 2020;
originally announced July 2020.
-
On Path Integration of Grid Cells: Group Representation and Isotropic Scaling
Authors:
Ruiqi Gao,
Jianwen Xie,
Xue-Xin Wei,
Song-Chun Zhu,
Ying Nian Wu
Abstract:
Understanding how grid cells perform path integration calculations remains a fundamental problem. In this paper, we conduct theoretical analysis of a general representation model of path integration by grid cells, where the 2D self-position is encoded as a higher dimensional vector, and the 2D self-motion is represented by a general transformation of the vector. We identify two conditions on the t…
▽ More
Understanding how grid cells perform path integration calculations remains a fundamental problem. In this paper, we conduct theoretical analysis of a general representation model of path integration by grid cells, where the 2D self-position is encoded as a higher dimensional vector, and the 2D self-motion is represented by a general transformation of the vector. We identify two conditions on the transformation. One is a group representation condition that is necessary for path integration. The other is an isotropic scaling condition that ensures locally conformal embedding, so that the error in the vector representation translates conformally to the error in the 2D self-position. Then we investigate the simplest transformation, i.e., the linear transformation, uncover its explicit algebraic and geometric structure as matrix Lie group of rotation, and explore the connection between the isotropic scaling condition and a special class of hexagon grid patterns. Finally, with our optimization-based approach, we manage to learn hexagon grid patterns that share similar properties of the grid cells in the rodent brain. The learned model is capable of accurate long distance path integration. Code is available at https://github.com/ruiqigao/grid-cell-path.
△ Less
Submitted 3 November, 2021; v1 submitted 17 June, 2020;
originally announced June 2020.
-
Active Learning for Skewed Data Sets
Authors:
Abbas Kazerouni,
Qi Zhao,
Jing Xie,
Sandeep Tata,
Marc Najork
Abstract:
Consider a sequential active learning problem where, at each round, an agent selects a batch of unlabeled data points, queries their labels and updates a binary classifier. While there exists a rich body of work on active learning in this general form, in this paper, we focus on problems with two distinguishing characteristics: severe class imbalance (skew) and small amounts of initial training da…
▽ More
Consider a sequential active learning problem where, at each round, an agent selects a batch of unlabeled data points, queries their labels and updates a binary classifier. While there exists a rich body of work on active learning in this general form, in this paper, we focus on problems with two distinguishing characteristics: severe class imbalance (skew) and small amounts of initial training data. Both of these problems occur with surprising frequency in many web applications. For instance, detecting offensive or sensitive content in online communities (pornography, violence, and hate-speech) is receiving enormous attention from industry as well as research communities. Such problems have both the characteristics we describe -- a vast majority of content is not offensive, so the number of positive examples for such content is orders of magnitude smaller than the negative examples. Furthermore, there is usually only a small amount of initial training data available when building machine-learned models to solve such problems. To address both these issues, we propose a hybrid active learning algorithm (HAL) that balances exploiting the knowledge available through the currently labeled training examples with exploring the large amount of unlabeled data available. Through simulation results, we show that HAL makes significantly better choices for what points to label when compared to strong baselines like margin-sampling. Classifiers trained on the examples selected for labeling by HAL easily out-perform the baselines on target metrics (like area under the precision-recall curve) given the same budget for labeling examples. We believe HAL offers a simple, intuitive, and computationally tractable way to structure active learning for a wide range of machine learning applications.
△ Less
Submitted 22 May, 2020;
originally announced May 2020.
-
GPCA: A Probabilistic Framework for Gaussian Process Embedded Channel Attention
Authors:
Jiyang Xie,
Dongliang Chang,
Zhanyu Ma,
Guoqiang Zhang,
Jun Guo
Abstract:
Channel attention mechanisms have been commonly applied in many visual tasks for effective performance improvement. It is able to reinforce the informative channels as well as to suppress the useless channels. Recently, different channel attention modules have been proposed and implemented in various ways. Generally speaking, they are mainly based on convolution and pooling operations. In this pap…
▽ More
Channel attention mechanisms have been commonly applied in many visual tasks for effective performance improvement. It is able to reinforce the informative channels as well as to suppress the useless channels. Recently, different channel attention modules have been proposed and implemented in various ways. Generally speaking, they are mainly based on convolution and pooling operations. In this paper, we propose Gaussian process embedded channel attention (GPCA) module and further interpret the channel attention schemes in a probabilistic way. The GPCA module intends to model the correlations among the channels, which are assumed to be captured by beta distributed variables. As the beta distribution cannot be integrated into the end-to-end training of convolutional neural networks (CNNs) with a mathematically tractable solution, we utilize an approximation of the beta distribution to solve this problem. To specify, we adapt a Sigmoid-Gaussian approximation, in which the Gaussian distributed variables are transferred into the interval [0,1]. The Gaussian process is then utilized to model the correlations among different channels. In this case, a mathematically tractable solution is derived. The GPCA module can be efficiently implemented and integrated into the end-to-end training of the CNNs. Experimental results demonstrate the promising performance of the proposed GPCA module. Codes are available at https://github.com/PRIS-CV/GPCA.
△ Less
Submitted 10 August, 2021; v1 submitted 10 March, 2020;
originally announced March 2020.
-
Scalable Bid Landscape Forecasting in Real-time Bidding
Authors:
Aritra Ghosh,
Saayan Mitra,
Somdeb Sarkhel,
Jason Xie,
Gang Wu,
Viswanathan Swaminathan
Abstract:
In programmatic advertising, ad slots are usually sold using second-price (SP) auctions in real-time. The highest bidding advertiser wins but pays only the second-highest bid (known as the winning price). In SP, for a single item, the dominant strategy of each bidder is to bid the true value from the bidder's perspective. However, in a practical setting, with budget constraints, bidding the true v…
▽ More
In programmatic advertising, ad slots are usually sold using second-price (SP) auctions in real-time. The highest bidding advertiser wins but pays only the second-highest bid (known as the winning price). In SP, for a single item, the dominant strategy of each bidder is to bid the true value from the bidder's perspective. However, in a practical setting, with budget constraints, bidding the true value is a sub-optimal strategy. Hence, to devise an optimal bidding strategy, it is of utmost importance to learn the winning price distribution accurately. Moreover, a demand-side platform (DSP), which bids on behalf of advertisers, observes the winning price if it wins the auction. For losing auctions, DSPs can only treat its bidding price as the lower bound for the unknown winning price. In literature, typically censored regression is used to model such partially observed data. A common assumption in censored regression is that the winning price is drawn from a fixed variance (homoscedastic) uni-modal distribution (most often Gaussian). However, in reality, these assumptions are often violated. We relax these assumptions and propose a heteroscedastic fully parametric censored regression approach, as well as a mixture density censored network. Our approach not only generalizes censored regression but also provides flexibility to model arbitrarily distributed real-world data. Experimental evaluation on the publicly available dataset for winning price estimation demonstrates the effectiveness of our method. Furthermore, we evaluate our algorithm on one of the largest demand-side platforms and significant improvement has been achieved in comparison with the baseline solutions.
△ Less
Submitted 17 January, 2020;
originally announced January 2020.
-
Representation Learning: A Statistical Perspective
Authors:
Jianwen Xie,
Ruiqi Gao,
Erik Nijkamp,
Song-Chun Zhu,
Ying Nian Wu
Abstract:
Learning representations of data is an important problem in statistics and machine learning. While the origin of learning representations can be traced back to factor analysis and multidimensional scaling in statistics, it has become a central theme in deep learning with important applications in computer vision and computational neuroscience. In this article, we review recent advances in learning…
▽ More
Learning representations of data is an important problem in statistics and machine learning. While the origin of learning representations can be traced back to factor analysis and multidimensional scaling in statistics, it has become a central theme in deep learning with important applications in computer vision and computational neuroscience. In this article, we review recent advances in learning representations from a statistical perspective. In particular, we review the following two themes: (a) unsupervised learning of vector representations and (b) learning of both vector and matrix representations.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
Efficient Projection-Free Online Methods with Stochastic Recursive Gradient
Authors:
Jiahao Xie,
Zebang Shen,
Chao Zhang,
Boyu Wang,
Hui Qian
Abstract:
This paper focuses on projection-free methods for solving smooth Online Convex Optimization (OCO) problems. Existing projection-free methods either achieve suboptimal regret bounds or have high per-iteration computational costs. To fill this gap, two efficient projection-free online methods called ORGFW and MORGFW are proposed for solving stochastic and adversarial OCO problems, respectively. By e…
▽ More
This paper focuses on projection-free methods for solving smooth Online Convex Optimization (OCO) problems. Existing projection-free methods either achieve suboptimal regret bounds or have high per-iteration computational costs. To fill this gap, two efficient projection-free online methods called ORGFW and MORGFW are proposed for solving stochastic and adversarial OCO problems, respectively. By employing a recursive gradient estimator, our methods achieve optimal regret bounds (up to a logarithmic factor) while possessing low per-iteration computational costs. Experimental results demonstrate the efficiency of the proposed methods compared to state-of-the-arts.
△ Less
Submitted 23 October, 2019; v1 submitted 21 October, 2019;
originally announced October 2019.
-
Aggregated Gradient Langevin Dynamics
Authors:
Chao Zhang,
Jiahao Xie,
Zebang Shen,
Peilin Zhao,
Tengfei Zhou,
Hui Qian
Abstract:
In this paper, we explore a general Aggregated Gradient Langevin Dynamics framework (AGLD) for the Markov Chain Monte Carlo (MCMC) sampling. We investigate the nonasymptotic convergence of AGLD with a unified analysis for different data accessing (e.g. random access, cyclic access and random reshuffle) and snapshot updating strategies, under convex and nonconvex settings respectively. It is the fi…
▽ More
In this paper, we explore a general Aggregated Gradient Langevin Dynamics framework (AGLD) for the Markov Chain Monte Carlo (MCMC) sampling. We investigate the nonasymptotic convergence of AGLD with a unified analysis for different data accessing (e.g. random access, cyclic access and random reshuffle) and snapshot updating strategies, under convex and nonconvex settings respectively. It is the first time that bounds for I/O friendly strategies such as cyclic access and random reshuffle have been established in the MCMC literature. The theoretic results also indicate that methods in AGLD possess the merits of both the low per-iteration computational complexity and the short mixture time. Empirical studies demonstrate that our framework allows to derive novel schemes to generate high-quality samples for large-scale Bayesian posterior learning tasks.
△ Less
Submitted 21 October, 2019;
originally announced October 2019.
-
Deep Kernel Learning via Random Fourier Features
Authors:
Jiaxuan Xie,
Fanghui Liu,
Kaijie Wang,
Xiaolin Huang
Abstract:
Kernel learning methods are among the most effective learning methods and have been vigorously studied in the past decades. However, when tackling with complicated tasks, classical kernel methods are not flexible or "rich" enough to describe the data and hence could not yield satisfactory performance. In this paper, via Random Fourier Features (RFF), we successfully incorporate the deep architectu…
▽ More
Kernel learning methods are among the most effective learning methods and have been vigorously studied in the past decades. However, when tackling with complicated tasks, classical kernel methods are not flexible or "rich" enough to describe the data and hence could not yield satisfactory performance. In this paper, via Random Fourier Features (RFF), we successfully incorporate the deep architecture into kernel learning, which significantly boosts the flexibility and richness of kernel machines while keeps kernels' advantage of pairwise handling small data. With RFF, we could establish a deep structure and make every kernel in RFF layers could be trained end-to-end. Since RFF with different distributions could represent different kernels, our model has the capability of finding suitable kernels for each layer, which is much more flexible than traditional kernel-based methods where the kernel is pre-selected. This fact also helps yield a more sophisticated kernel cascade connection in the architecture. On small datasets (less than 1000 samples), for which deep learning is generally not suitable due to overfitting, our method achieves superior performance compared to advanced kernel methods. On large-scale datasets, including non-image and image classification tasks, our method also has competitive performance.
△ Less
Submitted 7 October, 2019;
originally announced October 2019.
-
GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing
Authors:
Jian Guo,
He He,
Tong He,
Leonard Lausen,
Mu Li,
Haibin Lin,
Xingjian Shi,
Chenguang Wang,
Junyuan Xie,
Sheng Zha,
Aston Zhang,
Hang Zhang,
Zhi Zhang,
Zhongyue Zhang,
Shuai Zheng,
Yi Zhu
Abstract:
We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating). These toolkits provide state-of-the-art pre-trained models, training scripts, and training logs, to facilitate rapid prototyping and promote reproducible research. We also provide modular APIs with flexible building blocks to enable efficient customiza…
▽ More
We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating). These toolkits provide state-of-the-art pre-trained models, training scripts, and training logs, to facilitate rapid prototyping and promote reproducible research. We also provide modular APIs with flexible building blocks to enable efficient customization. Leveraging the MXNet ecosystem, the deep learning models in GluonCV and GluonNLP can be deployed onto a variety of platforms with different programming languages. The Apache 2.0 license has been adopted by GluonCV and GluonNLP to allow for software distribution, modification, and usage.
△ Less
Submitted 12 February, 2020; v1 submitted 9 July, 2019;
originally announced July 2019.
-
TEAM: A Multiple Testing Algorithm on the Aggregation Tree for Flow Cytometry Analysis
Authors:
John Pura,
Xuechan Li,
Cliburn Chan,
Jichun Xie
Abstract:
In immunology studies, flow cytometry is a commonly used multivariate single-cell assay. One key goal in flow cytometry analysis is to pinpoint the immune cells responsive to certain stimuli. Statistically, this problem can be translated into comparing two protein expression probability density functions (PDFs) before and after the stimulus; the goal is to pinpoint the regions where these two pdfs…
▽ More
In immunology studies, flow cytometry is a commonly used multivariate single-cell assay. One key goal in flow cytometry analysis is to pinpoint the immune cells responsive to certain stimuli. Statistically, this problem can be translated into comparing two protein expression probability density functions (PDFs) before and after the stimulus; the goal is to pinpoint the regions where these two pdfs differ. In this paper, we model this comparison as a multiple testing problem. First, we partition the sample space into small bins. In each bin we form a hypothesis to test the existence of differential pdfs. Second, we develop a novel multiple testing method, called TEAM (Testing on the Aggregation tree Method), to identify those bins that harbor differential pdfs while controlling the false discovery rate (FDR) under the desired level. TEAM embeds the testing procedure into an aggregation tree to test from fine- to coarse-resolution. The procedure achieves the statistical goal of pinpointing differential pdfs to the smallest possible regions. TEAM is computationally efficient, capable of analyzing large flow cytometry data sets in much shorter time compared with competing methods. We applied TEAM and competing methods on a flow cytometry data set to identify T cells responsive to the cytomeglovirus (CMV)-pp65 antigen stimulation. TEAM successfully identified the monofunctional, bifunctional, and polyfunctional T cells while the competing methods either did not finish in a reasonable time frame or provided less interpretable results. Numerical simulations and theoretical justifications demonstrate that TEAM has asymptotically valid, powerful, and robust performance. Overall, TEAM is a computationally efficient and statistically powerful algorithm that can yield meaningful biological insights in flow cytometry studies.
△ Less
Submitted 26 March, 2021; v1 submitted 18 June, 2019;
originally announced June 2019.
-
Assimilation of semi-qualitative sea ice thickness data with the EnKF-SQ
Authors:
Abhishek Shah,
Laurent Bertino,
Francois Counillon,
Mohamad El Gharamti,
Jiping Xie
Abstract:
A newly introduced stochastic data assimilation method, the Ensemble Kalman Filter Semi-Qualitative (EnKF-SQ) is applied to a realistic coupled ice-ocean model of the Arctic, the TOPAZ4 configuration, in a twin experiment framework. The method is shown to add value to range-limited thin ice thickness measurements, as obtained from passive microwave remote sensing, with respect to more trivial solu…
▽ More
A newly introduced stochastic data assimilation method, the Ensemble Kalman Filter Semi-Qualitative (EnKF-SQ) is applied to a realistic coupled ice-ocean model of the Arctic, the TOPAZ4 configuration, in a twin experiment framework. The method is shown to add value to range-limited thin ice thickness measurements, as obtained from passive microwave remote sensing, with respect to more trivial solutions like neglecting the out-of-range values or assimilating climatology instead. Some known properties inherent to the EnKF-SQ are evaluated: the tendency to draw the solution closer to the thickness threshold, the skewness of the resulting analysis ensemble and the potential appearance of outliers. The experiments show that none of these properties prove deleterious in light of the other sub-optimal characters of the sea ice data assimilation system used here (non-linearities, non-Gaussian variables, lack of strong coupling). The EnKF-SQ has a single tuning parameter that is adjusted for best performance of the system at hand. The sensitivity tests reveal that the results do not depend critically on the choice of this tuning parameter. The EnKF-SQ makes overall a valid approach for assimilating semi-qualitative observations into high-dimensional nonlinear systems.
△ Less
Submitted 29 April, 2019;
originally announced April 2019.
-
Deep CNNs Meet Global Covariance Pooling: Better Representation and Generalization
Authors:
Qilong Wang,
Jiangtao Xie,
Wangmeng Zuo,
Lei Zhang,
Peihua Li
Abstract:
Compared with global average pooling in existing deep convolutional neural networks (CNNs), global covariance pooling can capture richer statistics of deep features, having potential for improving representation and generalization abilities of deep CNNs. However, integration of global covariance pooling into deep CNNs brings two challenges: (1) robust covariance estimation given deep features of h…
▽ More
Compared with global average pooling in existing deep convolutional neural networks (CNNs), global covariance pooling can capture richer statistics of deep features, having potential for improving representation and generalization abilities of deep CNNs. However, integration of global covariance pooling into deep CNNs brings two challenges: (1) robust covariance estimation given deep features of high dimension and small sample size; (2) appropriate usage of geometry of covariances. To address these challenges, we propose a global Matrix Power Normalized COVariance (MPN-COV) Pooling. Our MPN-COV conforms to a robust covariance estimator, very suitable for scenario of high dimension and small sample size. It can also be regarded as Power-Euclidean metric between covariances, effectively exploiting their geometry. Furthermore, a global Gaussian embedding network is proposed to incorporate first-order statistics into MPN-COV. For fast training of MPN-COV networks, we implement an iterative matrix square root normalization, avoiding GPU unfriendly eigen-decomposition inherent in MPN-COV. Additionally, progressive 1x1 convolutions and group convolution are introduced to compress covariance representations. The proposed methods are highly modular, readily plugged into existing deep CNNs. Extensive experiments are conducted on large-scale object classification, scene categorization, fine-grained visual recognition and texture classification, showing our methods outperform the counterparts and obtain state-of-the-art performance.
△ Less
Submitted 10 August, 2020; v1 submitted 15 April, 2019;
originally announced April 2019.
-
Energy-Based Continuous Inverse Optimal Control
Authors:
Yifei Xu,
Jianwen Xie,
Tianyang Zhao,
Chris Baker,
Yibiao Zhao,
Ying Nian Wu
Abstract:
The problem of continuous inverse optimal control (over finite time horizon) is to learn the unknown cost function over the sequence of continuous control variables from expert demonstrations. In this article, we study this fundamental problem in the framework of energy-based model, where the observed expert trajectories are assumed to be random samples from a probability density function defined…
▽ More
The problem of continuous inverse optimal control (over finite time horizon) is to learn the unknown cost function over the sequence of continuous control variables from expert demonstrations. In this article, we study this fundamental problem in the framework of energy-based model, where the observed expert trajectories are assumed to be random samples from a probability density function defined as the exponential of the negative cost function up to a normalizing constant. The parameters of the cost function are learned by maximum likelihood via an "analysis by synthesis" scheme, which iterates (1) synthesis step: sample the synthesized trajectories from the current probability density using the Langevin dynamics via back-propagation through time, and (2) analysis step: update the model parameters based on the statistical difference between the synthesized trajectories and the observed trajectories. Given the fact that an efficient optimization algorithm is usually available for an optimal control problem, we also consider a convenient approximation of the above learning method, where we replace the sampling in the synthesis step by optimization. Moreover, to make the sampling or optimization more efficient, we propose to train the energy-based model simultaneously with a top-down trajectory generator via cooperative learning, where the trajectory generator is used to fast initialize the synthesis step of the energy-based model. We demonstrate the proposed methods on autonomous driving tasks, and show that they can learn suitable cost functions for optimal control.
△ Less
Submitted 18 April, 2022; v1 submitted 10 April, 2019;
originally announced April 2019.
-
Learning V1 Simple Cells with Vector Representation of Local Content and Matrix Representation of Local Motion
Authors:
Ruiqi Gao,
Jianwen Xie,
Siyuan Huang,
Yufan Ren,
Song-Chun Zhu,
Ying Nian Wu
Abstract:
This paper proposes a representational model for image pairs such as consecutive video frames that are related by local pixel displacements, in the hope that the model may shed light on motion perception in primary visual cortex (V1). The model couples the following two components: (1) the vector representations of local contents of images and (2) the matrix representations of local pixel displace…
▽ More
This paper proposes a representational model for image pairs such as consecutive video frames that are related by local pixel displacements, in the hope that the model may shed light on motion perception in primary visual cortex (V1). The model couples the following two components: (1) the vector representations of local contents of images and (2) the matrix representations of local pixel displacements caused by the relative motions between the agent and the objects in the 3D scene. When the image frame undergoes changes due to local pixel displacements, the vectors are multiplied by the matrices that represent the local displacements. Thus the vector representation is equivariant as it varies according to the local displacements. Our experiments show that our model can learn Gabor-like filter pairs of quadrature phases. The profiles of the learned filters match those of simple cells in Macaque V1. Moreover, we demonstrate that the model can learn to infer local motions in either a supervised or unsupervised manner. With such a simple model, we achieve competitive results on optical flow estimation.
△ Less
Submitted 5 April, 2022; v1 submitted 24 January, 2019;
originally announced February 2019.