-
Network Model Averaging Prediction for Latent Space Models by K-Fold Edge Cross-Validation
Authors:
Yan Zhang,
Jun Liao,
Xinyan Fan,
Kuangnan Fang,
Yuhong Yang
Abstract:
In complex systems, networks represent connectivity relationships between nodes through edges. Latent space models are crucial in analyzing network data for tasks like community detection and link prediction due to their interpretability and visualization capabilities. However, when the network size is relatively small, and the true latent space dimension is considerable, the parameters in latent…
▽ More
In complex systems, networks represent connectivity relationships between nodes through edges. Latent space models are crucial in analyzing network data for tasks like community detection and link prediction due to their interpretability and visualization capabilities. However, when the network size is relatively small, and the true latent space dimension is considerable, the parameters in latent space models may not be estimated very well. To address this issue, we propose a Network Model Averaging (NetMA) method tailored for latent space models with varying dimensions, specifically focusing on link prediction in networks. For both single-layer and multi-layer networks, we first establish the asymptotic optimality of the proposed averaging prediction in the sense of achieving the lowest possible prediction loss. Then we show that when the candidate models contain some correct models, our method assigns all weights to the correct models. Furthermore, we demonstrate the consistency of the NetMA-based weight estimator tending to the optimal weight vector. Extensive simulation studies show that NetMA performs better than simple averaging and model selection methods, and even outperforms the "oracle" method when the real latent space dimension is relatively large. Evaluation on collaboration and virtual event networks further emphasizes the competitiveness of NetMA in link prediction performance.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
Leveraging Shared Factor Structures for Enhanced Matrix Completion with Nonconvex Penalty Regularization
Authors:
Yuanhong A,
Xinyan Fan,
Bingyi Jing,
Bo Zhang
Abstract:
This article investigates the problem of noisy low-rank matrix completion with a shared factor structure, leveraging the auxiliary information from the missing indicator matrix to enhance prediction accuracy. Despite decades of development in matrix completion, the potential relationship between observed data and missing indicators has largely been overlooked. To address this gap, we propose a joi…
▽ More
This article investigates the problem of noisy low-rank matrix completion with a shared factor structure, leveraging the auxiliary information from the missing indicator matrix to enhance prediction accuracy. Despite decades of development in matrix completion, the potential relationship between observed data and missing indicators has largely been overlooked. To address this gap, we propose a joint modeling framework for the observed data and missing indicators within the context of a generalized factor model and derive the asymptotic limit distribution of the estimators. Furthermore, to tackle the rank estimation problem for model specification, we employ matrix nonconvex penalty regularization and establish nonasymptotic probability guarantees for the Oracle property. The theoretical results are validated through extensive simulation studies and real-world data analysis, demonstrating the effectiveness of the proposed method.
△ Less
Submitted 4 April, 2025;
originally announced April 2025.
-
Causal Inference under Interference: Regression Adjustment and Optimality
Authors:
Xinyuan Fan,
Chenlei Leng,
Weichi Wu
Abstract:
In randomized controlled trials without interference, regression adjustment is widely used to enhance the efficiency of treatment effect estimation. This paper extends this efficiency principle to settings with network interference, where a unit's response may depend on the treatments assigned to its neighbors in a network. We make three key contributions: (1) we establish a central limit theorem…
▽ More
In randomized controlled trials without interference, regression adjustment is widely used to enhance the efficiency of treatment effect estimation. This paper extends this efficiency principle to settings with network interference, where a unit's response may depend on the treatments assigned to its neighbors in a network. We make three key contributions: (1) we establish a central limit theorem for a linear regression-adjusted estimator and prove its optimality in achieving the smallest asymptotic variance within a class of linear adjustments; (2) we develop a novel, consistent estimator for the asymptotic variance of this linear estimator; and (3) we propose a nonparametric estimator that integrates kernel smoothing and trimming techniques, demonstrating its asymptotic normality and its optimality in minimizing asymptotic variance within a broader class of nonlinear adjustments. Extensive simulations validate the superior performance of our estimators, and a real-world data application illustrates their practical utility. Our findings underscore the power of regression-based methods and reveal the potential of kernel-and-trimming-based approaches for further enhancing efficiency under network interference.
△ Less
Submitted 17 February, 2025; v1 submitted 9 February, 2025;
originally announced February 2025.
-
Low-Rank Approaches to Graphon Learning in Networks
Authors:
Xinyuan Fan,
Feiyan Ma,
Chenlei Leng,
Weichi Wu
Abstract:
The graphon is a powerful framework for modeling large-scale networks, but its estimation remains a significant challenge. In this paper, we propose a novel approach that directly leverages a low-rank representation of the graphon for parsimonious modeling. This representation naturally yields both a low-rank connection probability matrix and a low-rank graphon -- two tasks that are often infeasib…
▽ More
The graphon is a powerful framework for modeling large-scale networks, but its estimation remains a significant challenge. In this paper, we propose a novel approach that directly leverages a low-rank representation of the graphon for parsimonious modeling. This representation naturally yields both a low-rank connection probability matrix and a low-rank graphon -- two tasks that are often infeasible in existing literature -- while also addressing the well-known identification issues in graphon estimation. By exploiting the additive structure of this representation, we develop an efficient sequential learning algorithm that estimates the low-rank connection matrix using subgraph counts and reconstructs the graphon function through interpolation. We establish the consistency of the proposed method and demonstrate its computational efficiency and estimation accuracy through extensive simulation studies.
△ Less
Submitted 30 January, 2025;
originally announced January 2025.
-
Learning to Bid in Non-Stationary Repeated First-Price Auctions
Authors:
Zihao Hu,
Xiaoyu Fan,
Yuan Yao,
Jiheng Zhang,
Zhengyuan Zhou
Abstract:
First-price auctions have recently gained significant traction in digital advertising markets, exemplified by Google's transition from second-price to first-price auctions. Unlike in second-price auctions, where bidding one's private valuation is a dominant strategy, determining an optimal bidding strategy in first-price auctions is more complex. From a learning perspective, the learner (a specifi…
▽ More
First-price auctions have recently gained significant traction in digital advertising markets, exemplified by Google's transition from second-price to first-price auctions. Unlike in second-price auctions, where bidding one's private valuation is a dominant strategy, determining an optimal bidding strategy in first-price auctions is more complex. From a learning perspective, the learner (a specific bidder) can interact with the environment (other bidders) sequentially to infer their behaviors. Existing research often assumes specific environmental conditions and benchmarks performance against the best fixed policy (static benchmark). While this approach ensures strong learning guarantees, the static benchmark can deviate significantly from the optimal strategy in environments with even mild non-stationarity. To address such scenarios, a dynamic benchmark, which represents the sum of the best possible rewards at each time step, offers a more suitable objective. However, achieving no-regret learning with respect to the dynamic benchmark requires additional constraints. By inspecting reward functions in online first-price auctions, we introduce two metrics to quantify the regularity of the bidding sequence, which serve as measures of non-stationarity. We provide a minimax-optimal characterization of the dynamic regret when either of these metrics is sub-linear in the time horizon.
△ Less
Submitted 22 January, 2025;
originally announced January 2025.
-
Nonstationary Sparse Spectral Permanental Process
Authors:
Zicheng Sun,
Yixuan Zhang,
Zenan Ling,
Xuhui Fan,
Feng Zhou
Abstract:
Existing permanental processes often impose constraints on kernel types or stationarity, limiting the model's expressiveness. To overcome these limitations, we propose a novel approach utilizing the sparse spectral representation of nonstationary kernels. This technique relaxes the constraints on kernel types and stationarity, allowing for more flexible modeling while reducing computational comple…
▽ More
Existing permanental processes often impose constraints on kernel types or stationarity, limiting the model's expressiveness. To overcome these limitations, we propose a novel approach utilizing the sparse spectral representation of nonstationary kernels. This technique relaxes the constraints on kernel types and stationarity, allowing for more flexible modeling while reducing computational complexity to the linear level. Additionally, we introduce a deep kernel variant by hierarchically stacking multiple spectral feature mappings, further enhancing the model's expressiveness to capture complex patterns in data. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of our approach, particularly in scenarios with pronounced data nonstationarity. Additionally, ablation studies are conducted to provide insights into the impact of various hyperparameters on model performance.
△ Less
Submitted 18 December, 2024; v1 submitted 4 October, 2024;
originally announced October 2024.
-
Generalized Bayesian nonparametric clustering framework for high-dimensional spatial omics data
Authors:
Bencong Zhu,
Guanyu Hu,
Xiaodan Fan,
Qiwei Li
Abstract:
The advent of next-generation sequencing-based spatially resolved transcriptomics (SRT) techniques has transformed genomic research by enabling high-throughput gene expression profiling while preserving spatial context. Identifying spatial domains within SRT data is a critical task, with numerous computational approaches currently available. However, most existing methods rely on a multi-stage pro…
▽ More
The advent of next-generation sequencing-based spatially resolved transcriptomics (SRT) techniques has transformed genomic research by enabling high-throughput gene expression profiling while preserving spatial context. Identifying spatial domains within SRT data is a critical task, with numerous computational approaches currently available. However, most existing methods rely on a multi-stage process that involves ad-hoc dimension reduction techniques to manage the high dimensionality of SRT data. These low-dimensional embeddings are then subjected to model-based or distance-based clustering methods. Additionally, many approaches depend on arbitrarily specifying the number of clusters (i.e., spatial domains), which can result in information loss and suboptimal downstream analysis. To address these limitations, we propose a novel Bayesian nonparametric mixture of factor analysis (BNPMFA) model, which incorporates a Markov random field-constrained Gibbs-type prior for partitioning high-dimensional spatial omics data. This new prior effectively integrates the spatial constraints inherent in SRT data while simultaneously inferring cluster membership and determining the optimal number of spatial domains. We have established the theoretical identifiability of cluster membership within this framework. The efficacy of our proposed approach is demonstrated through realistic simulations and applications to two SRT datasets. Our results show that the BNPMFA model not only surpasses state-of-the-art methods in clustering accuracy and estimating the number of clusters but also offers novel insights for identifying cellular regions within tissue samples.
△ Less
Submitted 26 August, 2024;
originally announced August 2024.
-
Polarization Holes as an Indicator of Magnetic Field-Angular Momentum Alignment I. Initial Tests
Authors:
Lijun Wang,
Zhuo Cao,
Xiaodan Fan,
Hua-bai Li
Abstract:
The formation of protostellar disks is still a mystery, largely due to the difficulties in observations that can constrain theories. For example, the 3D alignment between the rotation of the disk and the magnetic fields (B-fields) in the formation environment is critical in some models, but so far impossible to observe. Here, we study the possibility of probing the alignment between B-field and di…
▽ More
The formation of protostellar disks is still a mystery, largely due to the difficulties in observations that can constrain theories. For example, the 3D alignment between the rotation of the disk and the magnetic fields (B-fields) in the formation environment is critical in some models, but so far impossible to observe. Here, we study the possibility of probing the alignment between B-field and disk rotation using ``polarization holes'' (PHs). PHs are widely observed and are caused by unresolved B-field structures. With ideal magnetohydrodynamic (MHD) simulations, we demonstrate that different initial alignments between B-field and angular momentum (AM) can result in B-field structures that are distinct enough to produce distinguishable PHs. Thus PHs can potentially serve as probes for alignments between B-field and AM in disk formation.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Random Interval Distillation for Detection of Change-Points in Markov Chain Bernoulli Networks
Authors:
Xinyuan Fan,
Weichi Wu
Abstract:
We propose a new and generic approach for detecting multiple change-points in dynamic networks with Markov formation, termed random interval distillation (RID). By collecting random intervals with sufficient strength of signals and reassembling them into a sequence of informative short intervals, together with sparse universal singular value thresholding, our new approach can achieve nearly minima…
▽ More
We propose a new and generic approach for detecting multiple change-points in dynamic networks with Markov formation, termed random interval distillation (RID). By collecting random intervals with sufficient strength of signals and reassembling them into a sequence of informative short intervals, together with sparse universal singular value thresholding, our new approach can achieve nearly minimax optimality as their independent counterparts for both detection and localization bounds in low-rank networks without any prior knowledge about minimal spacing, which is unlike many previous methods. In particular, motivated by a recent nonasymptotic bound, our method uses the operator norm of CUSUMs of the adjacency matrices, and achieves the aforementioned optimality without sample splitting as required by the previous method. For practical applications, we introduce a clustering-based and data-driven procedure to determine the optimal threshold for signal strength, utilizing the connection between RID and clustering. We examine the effectiveness and usefulness of our methodology via simulations and a real data example.
△ Less
Submitted 27 October, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
Decomposition with Monotone B-splines: Fitting and Testing
Authors:
Lijun Wang,
Xiaodan Fan,
Hongyu Zhao,
Jun S. Liu
Abstract:
A univariate continuous function can always be decomposed as the sum of a non-increasing function and a non-decreasing one. Based on this property, we propose a non-parametric regression method that combines two spline-fitted monotone curves. We demonstrate by extensive simulations that, compared to standard spline-fitting methods, the proposed approach is particularly advantageous in high-noise s…
▽ More
A univariate continuous function can always be decomposed as the sum of a non-increasing function and a non-decreasing one. Based on this property, we propose a non-parametric regression method that combines two spline-fitted monotone curves. We demonstrate by extensive simulations that, compared to standard spline-fitting methods, the proposed approach is particularly advantageous in high-noise scenarios. Several theoretical guarantees are established for the proposed approach. Additionally, we present statistics to test the monotonicity of a function based on monotone decomposition, which can better control Type I error and achieve comparable (if not always higher) power compared to existing methods. Finally, we apply the proposed fitting and testing approaches to analyze the single-cell pseudotime trajectory datasets, identifying significant biological insights for non-monotonically expressed genes through Gene Ontology enrichment analysis. The source code implementing the methodology and producing all results is accessible at https://github.com/szcf-weiya/MonotoneDecomposition.jl.
△ Less
Submitted 9 April, 2024; v1 submitted 12 January, 2024;
originally announced January 2024.
-
Bayesian Nonparametric Clustering with Feature Selection for Spatially Resolved Transcriptomics Data
Authors:
Bencong Zhu,
Guanyu Hu,
Yang Xie,
Lin Xu,
Xiaodan Fan,
Qiwei Li
Abstract:
The advent of next-generation sequencing-based spatially resolved transcriptomics (SRT) techniques has reshaped genomic studies by enabling high-throughput gene expression profiling while preserving spatial and morphological context. Nevertheless, there are inherent challenges associated with these new high-dimensional spatial data, such as zero-inflation, over-dispersion, and heterogeneity. These…
▽ More
The advent of next-generation sequencing-based spatially resolved transcriptomics (SRT) techniques has reshaped genomic studies by enabling high-throughput gene expression profiling while preserving spatial and morphological context. Nevertheless, there are inherent challenges associated with these new high-dimensional spatial data, such as zero-inflation, over-dispersion, and heterogeneity. These challenges pose obstacles to effective clustering, which is a fundamental problem in SRT data analysis. Current computational approaches often rely on heuristic data preprocessing and arbitrary cluster number prespecification, leading to considerable information loss and consequently, suboptimal downstream analysis. In response to these challenges, we introduce BNPSpace, a novel Bayesian nonparametric spatial clustering framework that directly models SRT count data. BNPSpace facilitates the partitioning of the whole spatial domain, which is characterized by substantial heterogeneity, into homogeneous spatial domains with similar molecular characteristics while identifying a parsimonious set of discriminating genes among different spatial domains. Moreover, BNPSpace incorporates spatial information through a Markov random field prior model, encouraging a smooth and biologically meaningful partition pattern.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
Masked Transformer for Electrocardiogram Classification
Authors:
Ya Zhou,
Xiaolin Diao,
Yanni Huo,
Yang Liu,
Xiaohan Fan,
Wei Zhao
Abstract:
Electrocardiogram (ECG) is one of the most important diagnostic tools in clinical applications. With the advent of advanced algorithms, various deep learning models have been adopted for ECG tasks. However, the potential of Transformer for ECG data has not been fully realized, despite their widespread success in computer vision and natural language processing. In this work, we present Masked Trans…
▽ More
Electrocardiogram (ECG) is one of the most important diagnostic tools in clinical applications. With the advent of advanced algorithms, various deep learning models have been adopted for ECG tasks. However, the potential of Transformer for ECG data has not been fully realized, despite their widespread success in computer vision and natural language processing. In this work, we present Masked Transformer for ECG classification (MTECG), a simple yet effective method which significantly outperforms recent state-of-the-art algorithms in ECG classification. Our approach adapts the image-based masked autoencoders to self-supervised representation learning from ECG time series. We utilize a lightweight Transformer for the encoder and a 1-layer Transformer for the decoder. The ECG signal is split into a sequence of non-overlapping segments along the time dimension, and learnable positional embeddings are added to preserve the sequential information. We construct the Fuwai dataset comprising 220,251 ECG recordings with a broad range of diagnoses, annotated by medical experts, to explore the potential of Transformer. A strong pre-training and fine-tuning recipe is proposed from the empirical study. The experiments demonstrate that the proposed method increases the macro F1 scores by 3.4%-27.5% on the Fuwai dataset, 9.9%-32.0% on the PTB-XL dataset, and 9.4%-39.1% on a multicenter dataset, compared to the alternative methods. We hope that this study could direct future research on the application of Transformer to more ECG tasks.
△ Less
Submitted 22 April, 2024; v1 submitted 31 August, 2023;
originally announced September 2023.
-
Degrees of Freedom: Search Cost and Self-consistency
Authors:
Lijun Wang,
Hongyu Zhao,
Xiaodan Fan
Abstract:
Model degrees of freedom ($\df$) is a fundamental concept in statistics because it quantifies the flexibility of a fitting procedure and is indispensable in model selection. To investigate the gap between $\df$ and the number of independent variables in the fitting procedure, \textcite{tibshiraniDegreesFreedomModel2015} introduced the \emph{search degrees of freedom} ($\sdf$) concept to account fo…
▽ More
Model degrees of freedom ($\df$) is a fundamental concept in statistics because it quantifies the flexibility of a fitting procedure and is indispensable in model selection. To investigate the gap between $\df$ and the number of independent variables in the fitting procedure, \textcite{tibshiraniDegreesFreedomModel2015} introduced the \emph{search degrees of freedom} ($\sdf$) concept to account for the search cost during model selection. However, this definition has two limitations: it does not consider fitting procedures in augmented spaces and does not use the same fitting procedure for $\sdf$ and $\df$. We propose a \emph{modified search degrees of freedom} ($\msdf$) to directly account for the cost of searching in either original or augmented spaces. We check this definition for various fitting procedures, including classical linear regressions, spline methods, adaptive regressions (the best subset and the lasso), regression trees, and multivariate adaptive regression splines (MARS). In many scenarios when $\sdf$ is applicable, $\msdf$ reduces to $\sdf$. However, for certain procedures like the lasso, $\msdf$ offers a fresh perspective on search costs. For some complex procedures like MARS, the $\df$ has been pre-determined during model fitting, but the $\df$ of the final fitted procedure might differ from the pre-determined one. To investigate this discrepancy, we introduce the concepts of \emph{nominal} $\df$ and \emph{actual} $\df$, and define the property of \emph{self-consistency}, which occurs when there is no gap between these two $\df$'s. We propose a correction procedure for MARS to align these two $\df$'s, demonstrating improved fitting performance through extensive simulations and two real data applications.
△ Less
Submitted 19 July, 2024; v1 submitted 25 August, 2023;
originally announced August 2023.
-
Monotone Cubic B-Splines with a Neural-Network Generator
Authors:
Lijun Wang,
Xiaodan Fan,
Huabai Li,
Jun S. Liu
Abstract:
We present a method for fitting monotone curves using cubic B-splines, which is equivalent to putting a monotonicity constraint on the coefficients. We explore different ways of enforcing this constraint and analyze their theoretical and empirical properties. We propose two algorithms for solving the spline fitting problem: one that uses standard optimization techniques and one that trains a Multi…
▽ More
We present a method for fitting monotone curves using cubic B-splines, which is equivalent to putting a monotonicity constraint on the coefficients. We explore different ways of enforcing this constraint and analyze their theoretical and empirical properties. We propose two algorithms for solving the spline fitting problem: one that uses standard optimization techniques and one that trains a Multi-Layer Perceptrons (MLP) generator to approximate the solutions under various settings and perturbations. The generator approach can speed up the fitting process when we need to solve the problem repeatedly, such as when constructing confidence bands using bootstrap. We evaluate our method against several existing methods, some of which do not use the monotonicity constraint, on some monotone curves with varying noise levels. We demonstrate that our method outperforms the other methods, especially in high-noise scenarios. We also apply our method to analyze the polarization-hole phenomenon during star formation in astrophysics. The source code is accessible at \texttt{\url{https://github.com/szcf-weiya/MonotoneSplines.jl}}.
△ Less
Submitted 17 November, 2023; v1 submitted 4 July, 2023;
originally announced July 2023.
-
A Geometric Statistic for Quantifying Correlation Between Tree-Shaped Datasets
Authors:
Shanjun Mao,
Xiaodan Fan,
Jie Hu
Abstract:
The magnitude of Pearson correlation between two scalar random variables can be visually judged from the two-dimensional scatter plot of an independent and identically distributed sample drawn from the joint distribution of the two variables: the closer the points lie to a straight slanting line, the greater the correlation. To the best of our knowledge, similar graphical representation or geometr…
▽ More
The magnitude of Pearson correlation between two scalar random variables can be visually judged from the two-dimensional scatter plot of an independent and identically distributed sample drawn from the joint distribution of the two variables: the closer the points lie to a straight slanting line, the greater the correlation. To the best of our knowledge, similar graphical representation or geometric quantification of tree correlation does not exist in the literature although tree-shaped datasets are frequently encountered in various fields, such as academic genealogy tree and embryonic development tree. In this paper, we introduce a geometric statistic to both represent tree correlation intuitively and quantify its magnitude precisely. The theoretical properties of the geometric statistic are provided. Large-scale simulations based on various data distributions demonstrate that the geometric statistic is precise in measuring the tree correlation. Its real application on mathematical genealogy trees also demonstrated its usefulness.
△ Less
Submitted 11 March, 2023;
originally announced March 2023.
-
Bayesian Inference of Gene Expression Dynamics in Alzheimer Brains
Authors:
Shanjun Mao,
Xiaodan Fan
Abstract:
Alzheimer's disease (AD) is a serious neurodegenerative disease consisting of four stages where the illness gets progressively worse. It is of great significance to detect the gene regulatory mechanism as AD progresses and, thus, to help us better understand the causes of AD and find ways to treat or control AD. There are numerous researches to conduct this kind of study. However, the majority of…
▽ More
Alzheimer's disease (AD) is a serious neurodegenerative disease consisting of four stages where the illness gets progressively worse. It is of great significance to detect the gene regulatory mechanism as AD progresses and, thus, to help us better understand the causes of AD and find ways to treat or control AD. There are numerous researches to conduct this kind of study. However, the majority of methods are processing region by region of brain, stage by stage of AD, and then compare the results to detect changes. It is unclear how to combine these three dimensions, i.e., gene, region and stage, simultaneously to study gene expression dynamics of AD. This is the motivation of our research. In our study, we propose a statistical model of increments to clarify the relationship between gene expression in adjacent stages, so that we could better estimate the missing data we want and obtain a complete reasonable dynamic regulatory network model. Simulations are conducted to validate the statistical power of our algorithm. Moreover, a real data analysis shows that our method can capture the dynamic gene regulatory relationships among this complex brain data.
△ Less
Submitted 11 March, 2023;
originally announced March 2023.
-
Vine dependence graphs with latent variables as summaries for gene expression data
Authors:
Xinyao Fan,
Harry Joe,
Yongjin Park
Abstract:
The advent of high-throughput sequencing technologies has lead to vast comparative genome sequences. The construction of gene-gene interaction networks or dependence graphs on the genome scale is vital for understanding the regulation of biological processes. Different dependence graphs can provide different information.
Some existing methods for dependence graphs based on high-order partial cor…
▽ More
The advent of high-throughput sequencing technologies has lead to vast comparative genome sequences. The construction of gene-gene interaction networks or dependence graphs on the genome scale is vital for understanding the regulation of biological processes. Different dependence graphs can provide different information.
Some existing methods for dependence graphs based on high-order partial correlations are sparse and not informative when there are latent variables that can explain much of the dependence in groups of genes. Other methods of dependence graphs based on correlations and first-order partial correlations might have dense graphs. When genes can be divided into groups with stronger within group dependence in gene expression than between group dependence, we present a dependence graph based on truncated vines with latent variables that makes use of group information and low-order partial correlations. The graphs are not dense, and the genes that might be more central have more neighbors in the vine dependency graph. We demonstrate the use of our dependence graph construction on two RNA-seq data sets -- yeast and prostate cancer. There is some biological evidence to support the relationship between genes in the resulting dependence graphs.
A flexible framework is provided for building dependence graphs via low-order partial correlations and formation of groups, leading to graphs that are not too sparse or dense. We anticipate that this approach will help to identify groups that might be central to different biological functions.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
Free-Form Variational Inference for Gaussian Process State-Space Models
Authors:
Xuhui Fan,
Edwin V. Bonilla,
Terence J. O'Kane,
Scott A. Sisson
Abstract:
Gaussian process state-space models (GPSSMs) provide a principled and flexible approach to modeling the dynamics of a latent state, which is observed at discrete-time points via a likelihood model. However, inference in GPSSMs is computationally and statistically challenging due to the large number of latent variables in the model and the strong temporal dependencies between them. In this paper, w…
▽ More
Gaussian process state-space models (GPSSMs) provide a principled and flexible approach to modeling the dynamics of a latent state, which is observed at discrete-time points via a likelihood model. However, inference in GPSSMs is computationally and statistically challenging due to the large number of latent variables in the model and the strong temporal dependencies between them. In this paper, we propose a new method for inference in Bayesian GPSSMs, which overcomes the drawbacks of previous approaches, namely over-simplified assumptions, and high computational requirements. Our method is based on free-form variational inference via stochastic gradient Hamiltonian Monte Carlo within the inducing-variable formalism. Furthermore, by exploiting our proposed variational distribution, we provide a collapsed extension of our method where the inducing variables are marginalized analytically. We also showcase results when combining our framework with particle MCMC methods. We show that, on six real-world datasets, our approach can learn transition dynamics and latent states more accurately than competing methods.
△ Less
Submitted 16 July, 2023; v1 submitted 20 February, 2023;
originally announced February 2023.
-
Horospherical Decision Boundaries for Large Margin Classification in Hyperbolic Space
Authors:
Xiran Fan,
Chun-Hao Yang,
Baba C. Vemuri
Abstract:
Hyperbolic spaces have been quite popular in the recent past for representing hierarchically organized data. Further, several classification algorithms for data in these spaces have been proposed in the literature. These algorithms mainly use either hyperplanes or geodesics for decision boundaries in a large margin classifiers setting leading to a non-convex optimization problem. In this paper, we…
▽ More
Hyperbolic spaces have been quite popular in the recent past for representing hierarchically organized data. Further, several classification algorithms for data in these spaces have been proposed in the literature. These algorithms mainly use either hyperplanes or geodesics for decision boundaries in a large margin classifiers setting leading to a non-convex optimization problem. In this paper, we propose a novel large margin classifier based on horospherical decision boundaries that leads to a geodesically convex optimization problem that can be optimized using any Riemannian gradient descent technique guaranteeing a globally optimal solution. We present several experiments depicting the competitive performance of our classifier in comparison to SOTA.
△ Less
Submitted 28 September, 2023; v1 submitted 13 February, 2023;
originally announced February 2023.
-
Robust structured heterogeneity analysis approach for high-dimensional data
Authors:
Yifan Sun,
Ziye Luo,
Xinyan Fan
Abstract:
Revealing relationships between genes and disease phenotypes is a critical problem in biomedical studies. This problem has been challenged by the heterogeneity of diseases. Patients of a perceived same disease may form multiple subgroups, and different subgroups have distinct sets of important genes. It is hence imperative to discover the latent subgroups and reveal the subgroup-specific important…
▽ More
Revealing relationships between genes and disease phenotypes is a critical problem in biomedical studies. This problem has been challenged by the heterogeneity of diseases. Patients of a perceived same disease may form multiple subgroups, and different subgroups have distinct sets of important genes. It is hence imperative to discover the latent subgroups and reveal the subgroup-specific important genes. Some heterogeneity analysis methods have been proposed in recent literature. Despite considerable successes, most of the existing studies are still limited as they cannot accommodate data contamination and ignore the interconnections among genes. Aiming at these shortages, we develop a robust structured heterogeneity analysis approach to identify subgroups, select important genes as well as estimate their effects on the phenotype of interest. Possible data contamination is accommodated by employing the Huber loss function. A sparse overlapping group lasso penalty is imposed to conduct regularization estimation and gene identification, while taking into account the possibly overlapping cluster structure of genes. This approach takes an iterative strategy in the similar spirit of K-means clustering. Simulations demonstrate that the proposed approach outperforms alternatives in revealing the heterogeneity and selecting important genes for each subgroup. The analysis of Cancer Cell Line Encyclopedia data leads to biologically meaningful findings with improved prediction and grouping stability.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
Regression-based heterogeneity analysis to identify overlapping subgroup structure in high-dimensional data
Authors:
Ziye Luo,
Xinyue Yao,
Yifan Sun,
Xinyan Fan
Abstract:
Heterogeneity is a hallmark of complex diseases. Regression-based heterogeneity analysis, which is directly concerned with outcome-feature relationships, has led to a deeper understanding of disease biology. Such an analysis identifies the underlying subgroup structure and estimates the subgroup-specific regression coefficients. However, most of the existing regression-based heterogeneity analyses…
▽ More
Heterogeneity is a hallmark of complex diseases. Regression-based heterogeneity analysis, which is directly concerned with outcome-feature relationships, has led to a deeper understanding of disease biology. Such an analysis identifies the underlying subgroup structure and estimates the subgroup-specific regression coefficients. However, most of the existing regression-based heterogeneity analyses can only address disjoint subgroups; that is, each sample is assigned to only one subgroup. In reality, some samples have multiple labels, for example, many genes have several biological functions, and some cells of pure cell types transition into other types over time, which suggest that their outcome-feature relationships (regression coefficients) can be a mixture of relationships in more than one subgroups, and as a result, the disjoint subgrouping results can be unsatisfactory. To this end, we develop a novel approach to regression-based heterogeneity analysis, which takes into account possible overlaps between subgroups and high data dimensions. A subgroup membership vector is introduced for each sample, which is combined with a loss function. Considering the lack of information arising from small sample sizes, an $l_2$ norm penalty is developed for each membership vector to encourage similarity in its elements. A sparse penalization is also applied for regularized estimation and feature selection. Extensive simulations demonstrate its superiority over direct competitors. The analysis of Cancer Cell Line Encyclopedia data and lung cancer data from The Cancer Genome Atlas shows that the proposed approach can identify an overlapping subgroup structure with favorable performance in prediction and stability.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
High-dimensional factor copula models with estimation of latent variables
Authors:
Xinyao Fan,
Harry Joe
Abstract:
Factor models are a parsimonious way to explain the dependence of variables using several latent variables. In Gaussian 1-factor and structural factor models (such as bi-factor, oblique factor) and their factor copula counterparts, factor scores or proxies are defined as conditional expectations of latent variables given the observed variables. With mild assumptions, the proxies are consistent for…
▽ More
Factor models are a parsimonious way to explain the dependence of variables using several latent variables. In Gaussian 1-factor and structural factor models (such as bi-factor, oblique factor) and their factor copula counterparts, factor scores or proxies are defined as conditional expectations of latent variables given the observed variables. With mild assumptions, the proxies are consistent for corresponding latent variables as the sample size and the number of observed variables linked to each latent variable go to infinity. When the bivariate copulas linking observed variables to latent variables are not assumed in advance, sequential procedures are used for latent variables estimation, copula family selection and parameter estimation. The use of proxy variables for factor copulas means that approximate log-likelihoods can be used to estimate copula parameters with less computational effort for numerical integration.
△ Less
Submitted 28 May, 2022;
originally announced May 2022.
-
Mutual Influence Regression Model
Authors:
Xinyan Fan,
Wei Lan,
Tao Zou,
Chih-Ling Tsai
Abstract:
In this article, we propose the mutual influence regression model (MIR) to establish the relationship between the mutual influence matrix of actors and a set of similarity matrices induced by their associated attributes. This model is able to explain the heterogeneous structure of the mutual influence matrix by extending the commonly used spatial autoregressive model while allowing it to change wi…
▽ More
In this article, we propose the mutual influence regression model (MIR) to establish the relationship between the mutual influence matrix of actors and a set of similarity matrices induced by their associated attributes. This model is able to explain the heterogeneous structure of the mutual influence matrix by extending the commonly used spatial autoregressive model while allowing it to change with time. To facilitate making inferences with MIR, we establish parameter estimation, weight matrices selection and model testing. Specifically, we employ the quasi-maximum likelihood estimation method to estimate unknown regression coefficients, and demonstrate that the resulting estimator is asymptotically normal without imposing the normality assumption and while allowing the number of similarity matrices to diverge. In addition, an extended BIC-type criterion is introduced for selecting relevant matrices from the divergent number of similarity matrices. To assess the adequacy of the proposed model, we further propose an influence matrix test and develop a novel approach in order to obtain the limiting distribution of the test. Finally, we extend the model to accommodate endogenous weight matrices, exogenous covariates, and both individual and time fixed effects, to broaden the usefulness of MIR. The simulation studies support our theoretical findings, and a real example is presented to illustrate the usefulness of the proposed MIR model.
△ Less
Submitted 15 May, 2022;
originally announced May 2022.
-
Covariance Model with General Linear Structure and Divergent Parameters
Authors:
Xinyan Fan,
Wei Lan,
Tao Zou,
Chih-Ling Tsai
Abstract:
For estimating the large covariance matrix with a limited sample size, we propose the covariance model with general linear structure (CMGL) by employing the general link function to connect the covariance of the continuous response vector to a linear combination of weight matrices. Without assuming the distribution of responses, and allowing the number of parameters associated with weight matrices…
▽ More
For estimating the large covariance matrix with a limited sample size, we propose the covariance model with general linear structure (CMGL) by employing the general link function to connect the covariance of the continuous response vector to a linear combination of weight matrices. Without assuming the distribution of responses, and allowing the number of parameters associated with weight matrices to diverge, we obtain the quasi-maximum likelihood estimators (QMLE) of parameters and show their asymptotic properties. In addition, an extended Bayesian information criteria (EBIC) is proposed to select relevant weight matrices, and the consistency of EBIC is demonstrated. Under the identity link function, we introduce the ordinary least squares estimator (OLS) that has the closed form. Hence, its computational burden is reduced compared to QMLE, and the theoretical properties of OLS are also investigated. To assess the adequacy of the link function, we further propose the quasi-likelihood ratio test and obtain its limiting distribution. Simulation studies are presented to assess the performance of the proposed methods, and the usefulness of generalized covariance models is illustrated by an analysis of the US stock market.
△ Less
Submitted 14 May, 2022;
originally announced May 2022.
-
Robust Regularized Low-Rank Matrix Models for Regression and Classification
Authors:
Hsin-Hsiung Huang,
Feng Yu,
Xing Fan,
Teng Zhang
Abstract:
While matrix variate regression models have been studied in many existing works, classical statistical and computational methods for the analysis of the regression coefficient estimation are highly affected by high dimensional and noisy matrix-valued predictors. To address these issues, this paper proposes a framework of matrix variate regression models based on a rank constraint, vector regulariz…
▽ More
While matrix variate regression models have been studied in many existing works, classical statistical and computational methods for the analysis of the regression coefficient estimation are highly affected by high dimensional and noisy matrix-valued predictors. To address these issues, this paper proposes a framework of matrix variate regression models based on a rank constraint, vector regularization (e.g., sparsity), and a general loss function with three special cases considered: ordinary matrix regression, robust matrix regression, and matrix logistic regression. We also propose an alternating projected gradient descent algorithm. Based on analyzing our objective functions on manifolds with bounded curvature, we show that the algorithm is guaranteed to converge, all accumulation points of the iterates have estimation errors in the order of $O(1/\sqrt{n})$ asymptotically and substantially attaining the minimax rate. Our theoretical analysis can be applied to general optimization problems on manifolds with bounded curvature and can be considered an important technical contribution to this work. We validate the proposed method through simulation studies and real image data examples.
△ Less
Submitted 14 May, 2022;
originally announced May 2022.
-
Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design
Authors:
Xiran Fan,
Chun-Hao Yang,
Baba C. Vemuri
Abstract:
Hyperbolic neural networks have been popular in the recent past due to their ability to represent hierarchical data sets effectively and efficiently. The challenge in developing these networks lies in the nonlinearity of the embedding space namely, the Hyperbolic space. Hyperbolic space is a homogeneous Riemannian manifold of the Lorentz group. Most existing methods (with some exceptions) use loca…
▽ More
Hyperbolic neural networks have been popular in the recent past due to their ability to represent hierarchical data sets effectively and efficiently. The challenge in developing these networks lies in the nonlinearity of the embedding space namely, the Hyperbolic space. Hyperbolic space is a homogeneous Riemannian manifold of the Lorentz group. Most existing methods (with some exceptions) use local linearization to define a variety of operations paralleling those used in traditional deep neural networks in Euclidean spaces. In this paper, we present a novel fully hyperbolic neural network which uses the concept of projections (embeddings) followed by an intrinsic aggregation and a nonlinearity all within the hyperbolic space. The novelty here lies in the projection which is designed to project data on to a lower-dimensional embedded hyperbolic space and hence leads to a nested hyperbolic space representation independently useful for dimensionality reduction. The main theoretical contribution is that the proposed embedding is proved to be isometric and equivariant under the Lorentz transformations. This projection is computationally efficient since it can be expressed by simple linear operations, and, due to the aforementioned equivariance property, it allows for weight sharing. The nested hyperbolic space representation is the core component of our network and therefore, we first compare this ensuing nested hyperbolic space representation with other dimensionality reduction methods such as tangent PCA, principal geodesic analysis (PGA) and HoroPCA. Based on this equivariant embedding, we develop a novel fully hyperbolic graph convolutional neural network architecture to learn the parameters of the projection. Finally, we present experiments demonstrating comparative performance of our network on several publicly available data sets.
△ Less
Submitted 2 December, 2021;
originally announced December 2021.
-
Alignment Attention by Matching Key and Query Distributions
Authors:
Shujian Zhang,
Xinjie Fan,
Huangjie Zheng,
Korawat Tanwisuth,
Mingyuan Zhou
Abstract:
The neural attention mechanism has been incorporated into deep neural networks to achieve state-of-the-art performance in various domains. Most such models use multi-head self-attention which is appealing for the ability to attend to information from different perspectives. This paper introduces alignment attention that explicitly encourages self-attention to match the distributions of the key and…
▽ More
The neural attention mechanism has been incorporated into deep neural networks to achieve state-of-the-art performance in various domains. Most such models use multi-head self-attention which is appealing for the ability to attend to information from different perspectives. This paper introduces alignment attention that explicitly encourages self-attention to match the distributions of the key and query within each head. The resulting alignment attention networks can be optimized as an unsupervised regularization in the existing attention framework. It is simple to convert any models with self-attention, including pre-trained ones, to the proposed alignment attention. On a variety of language understanding tasks, we show the effectiveness of our method in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks. We further demonstrate the general applicability of our approach on graph attention and visual question answering, showing the great potential of incorporating our alignment method into various attention-related tasks.
△ Less
Submitted 24 October, 2021;
originally announced October 2021.
-
A Prototype-Oriented Framework for Unsupervised Domain Adaptation
Authors:
Korawat Tanwisuth,
Xinjie Fan,
Huangjie Zheng,
Shujian Zhang,
Hao Zhang,
Bo Chen,
Mingyuan Zhou
Abstract:
Existing methods for unsupervised domain adaptation often rely on minimizing some statistical distance between the source and target samples in the latent space. To avoid the sampling variability, class imbalance, and data-privacy concerns that often plague these methods, we instead provide a memory and computation-efficient probabilistic framework to extract class prototypes and align the target…
▽ More
Existing methods for unsupervised domain adaptation often rely on minimizing some statistical distance between the source and target samples in the latent space. To avoid the sampling variability, class imbalance, and data-privacy concerns that often plague these methods, we instead provide a memory and computation-efficient probabilistic framework to extract class prototypes and align the target features with them. We demonstrate the general applicability of our method on a wide range of scenarios, including single-source, multi-source, class-imbalance, and source-private domain adaptation. Requiring no additional model parameters and having a moderate increase in computation over the source model alone, the proposed method achieves competitive performance with state-of-the-art methods.
△ Less
Submitted 22 October, 2021;
originally announced October 2021.
-
Self-normalized Cramer moderate deviations for a supercritical Galton-Watson process
Authors:
Xiequan Fan,
Qi-Man Shao
Abstract:
Let $(Z_n)_{n\geq0}$ be a supercritical Galton-Watson process. Consider the Lotka-Nagaev estimator for the offspring mean. In this paper, we establish self-normalized Cramér type moderate deviations and Berry-Esseen's bounds for the Lotka-Nagaev estimator. The results are believed to be optimal or near optimal.
Let $(Z_n)_{n\geq0}$ be a supercritical Galton-Watson process. Consider the Lotka-Nagaev estimator for the offspring mean. In this paper, we establish self-normalized Cramér type moderate deviations and Berry-Esseen's bounds for the Lotka-Nagaev estimator. The results are believed to be optimal or near optimal.
△ Less
Submitted 16 July, 2021;
originally announced July 2021.
-
Bayesian Attention Belief Networks
Authors:
Shujian Zhang,
Xinjie Fan,
Bo Chen,
Mingyuan Zhou
Abstract:
Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks. Most such models use deterministic attention while stochastic attention is less explored due to the optimization difficulties or complicated model design. This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights with a hierar…
▽ More
Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks. Most such models use deterministic attention while stochastic attention is less explored due to the optimization difficulties or complicated model design. This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights with a hierarchy of gamma distributions, and an encoder network by stacking Weibull distributions with a deterministic-upward-stochastic-downward structure to approximate the posterior. The resulting auto-encoding networks can be optimized in a differentiable way with a variational lower bound. It is simple to convert any models with deterministic attention, including pretrained ones, to the proposed Bayesian attention belief networks. On a variety of language understanding tasks, we show that our method outperforms deterministic attention and state-of-the-art stochastic attention in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks. We further demonstrate the general applicability of our method on neural machine translation and visual question answering, showing great potential of incorporating our method into various attention-related tasks.
△ Less
Submitted 9 June, 2021;
originally announced June 2021.
-
Maximizing Mutual Information Across Feature and Topology Views for Learning Graph Representations
Authors:
Xiaolong Fan,
Maoguo Gong,
Yue Wu,
Hao Li
Abstract:
Recently, maximizing mutual information has emerged as a powerful method for unsupervised graph representation learning. The existing methods are typically effective to capture information from the topology view but ignore the feature view. To circumvent this issue, we propose a novel approach by exploiting mutual information maximization across feature and topology views. Specifically, we first u…
▽ More
Recently, maximizing mutual information has emerged as a powerful method for unsupervised graph representation learning. The existing methods are typically effective to capture information from the topology view but ignore the feature view. To circumvent this issue, we propose a novel approach by exploiting mutual information maximization across feature and topology views. Specifically, we first utilize a multi-view representation learning module to better capture both local and global information content across feature and topology views on graphs. To model the information shared by the feature and topology spaces, we then develop a common representation learning module using mutual information maximization and reconstruction loss minimization. To explicitly encourage diversity between graph representations from the same view, we also introduce a disagreement regularization to enlarge the distance between representations from the same view. Experiments on synthetic and real-world datasets demonstrate the effectiveness of integrating feature and topology views. In particular, compared with the previous supervised methods, our proposed method can achieve comparable or even better performance under the unsupervised representation and linear evaluation protocol.
△ Less
Submitted 11 October, 2022; v1 submitted 14 May, 2021;
originally announced May 2021.
-
ALMA: Alternating Minimization Algorithm for Clustering Mixture Multilayer Network
Authors:
Xing Fan,
Marianna Pensky,
Feng Yu,
Teng Zhang
Abstract:
The paper considers a Mixture Multilayer Stochastic Block Model (MMLSBM), where layers can be partitioned into groups of similar networks, and networks in each group are equipped with a distinct Stochastic Block Model. The goal is to partition the multilayer network into clusters of similar layers, and to identify communities in those layers. Jing et al. (2020) introduced the MMLSBM and developed…
▽ More
The paper considers a Mixture Multilayer Stochastic Block Model (MMLSBM), where layers can be partitioned into groups of similar networks, and networks in each group are equipped with a distinct Stochastic Block Model. The goal is to partition the multilayer network into clusters of similar layers, and to identify communities in those layers. Jing et al. (2020) introduced the MMLSBM and developed a clustering methodology, TWIST, based on regularized tensor decomposition. The present paper proposes a different technique, an alternating minimization algorithm (ALMA), that aims at simultaneous recovery of the layer partition, together with estimation of the matrices of connection probabilities of the distinct layers. Compared to TWIST, ALMA achieves higher accuracy both theoretically and numerically.
△ Less
Submitted 12 October, 2021; v1 submitted 19 February, 2021;
originally announced February 2021.
-
Deviation inequalities for stochastic approximation by averaging
Authors:
Xiequan Fan,
Pierre Alquier,
Paul Doukhan
Abstract:
We introduce a class of Markov chains, that contains the model of stochastic approximation by averaging and non-averaging. Using martingale approximation method, we establish various deviation inequalities for separately Lipschitz functions of such a chain, with different moment conditions on some dominating random variables of martingale differences.Finally, we apply these inequalities to the sto…
▽ More
We introduce a class of Markov chains, that contains the model of stochastic approximation by averaging and non-averaging. Using martingale approximation method, we establish various deviation inequalities for separately Lipschitz functions of such a chain, with different moment conditions on some dominating random variables of martingale differences.Finally, we apply these inequalities to the stochastic approximation by averaging and empirical risk minimisation.
△ Less
Submitted 18 February, 2022; v1 submitted 17 February, 2021;
originally announced February 2021.
-
Bayesian Attention Modules
Authors:
Xinjie Fan,
Shujian Zhang,
Bo Chen,
Mingyuan Zhou
Abstract:
Attention modules, as simple and effective tools, have not only enabled deep neural networks to achieve state-of-the-art results in many domains, but also enhanced their interpretability. Most current models use deterministic attention modules due to their simplicity and ease of optimization. Stochastic counterparts, on the other hand, are less popular despite their potential benefits. The main re…
▽ More
Attention modules, as simple and effective tools, have not only enabled deep neural networks to achieve state-of-the-art results in many domains, but also enhanced their interpretability. Most current models use deterministic attention modules due to their simplicity and ease of optimization. Stochastic counterparts, on the other hand, are less popular despite their potential benefits. The main reason is that stochastic attention often introduces optimization issues or requires significant model changes. In this paper, we propose a scalable stochastic version of attention that is easy to implement and optimize. We construct simplex-constrained attention distributions by normalizing reparameterizable distributions, making the training process differentiable. We learn their parameters in a Bayesian framework where a data-dependent prior is introduced for regularization. We apply the proposed stochastic attention modules to various attention-based models, with applications to graph node classification, visual question answering, image captioning, machine translation, and language understanding. Our experiments show the proposed method brings consistent improvements over the corresponding baselines.
△ Less
Submitted 20 October, 2020;
originally announced October 2020.
-
Attention that does not Explain Away
Authors:
Nan Ding,
Xinjie Fan,
Zhenzhong Lan,
Dale Schuurmans,
Radu Soricut
Abstract:
Models based on the Transformer architecture have achieved better accuracy than the ones based on competing architectures for a large set of tasks. A unique feature of the Transformer is its universal application of a self-attention mechanism, which allows for free information flow at arbitrary distances. Following a probabilistic view of the attention via the Gaussian mixture model, we find empir…
▽ More
Models based on the Transformer architecture have achieved better accuracy than the ones based on competing architectures for a large set of tasks. A unique feature of the Transformer is its universal application of a self-attention mechanism, which allows for free information flow at arbitrary distances. Following a probabilistic view of the attention via the Gaussian mixture model, we find empirical evidence that the Transformer attention tends to "explain away" certain input neurons. To compensate for this, we propose a doubly-normalized attention scheme that is simple to implement and provides theoretical guarantees for avoiding the "explaining away" effect without introducing significant computational or memory cost. Empirically, we show that the new attention schemes result in improved performance on several well-known benchmarks.
△ Less
Submitted 29 September, 2020;
originally announced September 2020.
-
Review of Machine-Learning Methods for RNA Secondary Structure Prediction
Authors:
Qi Zhao,
Zheng Zhao,
Xiaoya Fan,
Zhengwei Yuan,
Qian Mao,
Yudong Yao
Abstract:
Secondary structure plays an important role in determining the function of non-coding RNAs. Hence, identifying RNA secondary structures is of great value to research. Computational prediction is a mainstream approach for predicting RNA secondary structure. Unfortunately, even though new methods have been proposed over the past 40 years, the performance of computational prediction methods has stagn…
▽ More
Secondary structure plays an important role in determining the function of non-coding RNAs. Hence, identifying RNA secondary structures is of great value to research. Computational prediction is a mainstream approach for predicting RNA secondary structure. Unfortunately, even though new methods have been proposed over the past 40 years, the performance of computational prediction methods has stagnated in the last decade. Recently, with the increasing availability of RNA structure data, new methods based on machine-learning technologies, especially deep learning, have alleviated the issue. In this review, we provide a comprehensive overview of RNA secondary structure prediction methods based on machine-learning technologies and a tabularized summary of the most important methods in this field. The current pending issues in the field of RNA secondary structure prediction and future trends are also discussed.
△ Less
Submitted 31 August, 2020;
originally announced September 2020.
-
Appearance-free Tripartite Matching for Multiple Object Tracking
Authors:
Lijun Wang,
Yanting Zhu,
Jue Shi,
Xiaodan Fan
Abstract:
Multiple Object Tracking (MOT) detects the trajectories of multiple objects given an input video. It has become more and more important for various research and industry areas, such as cell tracking for biomedical research and human tracking in video surveillance. Most existing algorithms depend on the uniqueness of the object's appearance, and the dominating bipartite matching scheme ignores the…
▽ More
Multiple Object Tracking (MOT) detects the trajectories of multiple objects given an input video. It has become more and more important for various research and industry areas, such as cell tracking for biomedical research and human tracking in video surveillance. Most existing algorithms depend on the uniqueness of the object's appearance, and the dominating bipartite matching scheme ignores the speed smoothness. Although several methods have incorporated the velocity smoothness for tracking, they either fail to pursue global smooth velocity or are often trapped in local optimums. We focus on the general MOT problem regardless of the appearance and propose an appearance-free tripartite matching to avoid the irregular velocity problem of the bipartite matching. The tripartite matching is formulated as maximizing the likelihood of the state vectors constituted of the position and velocity of objects, which results in a chain-dependent structure. We resort to the dynamic programming algorithm to find such a maximum likelihood estimate. To overcome the high computational cost induced by the vast search space of dynamic programming when many objects are to be tracked, we decompose the space by the number of disappearing objects and propose a reduced-space approach by truncating the decomposition. Extensive simulations have shown the superiority and efficiency of our proposed method, and the comparisons with top methods on Cell Tracking Challenge also demonstrate our competence. We also applied our method to track the motion of natural killer cells around tumor cells in a cancer study.\footnote{The source code is available on \url{https://github.com/szcf-weiya/TriMatchMOT}
△ Less
Submitted 7 October, 2021; v1 submitted 8 August, 2020;
originally announced August 2020.
-
Deep Retrieval: Learning A Retrievable Structure for Large-Scale Recommendations
Authors:
Weihao Gao,
Xiangjun Fan,
Chong Wang,
Jiankai Sun,
Kai Jia,
Wenzhi Xiao,
Ruofan Ding,
Xingyan Bin,
Hui Yang,
Xiaobing Liu
Abstract:
One of the core problems in large-scale recommendations is to retrieve top relevant candidates accurately and efficiently, preferably in sub-linear time. Previous approaches are mostly based on a two-step procedure: first learn an inner-product model, and then use some approximate nearest neighbor (ANN) search algorithm to find top candidates. In this paper, we present Deep Retrieval (DR), to lear…
▽ More
One of the core problems in large-scale recommendations is to retrieve top relevant candidates accurately and efficiently, preferably in sub-linear time. Previous approaches are mostly based on a two-step procedure: first learn an inner-product model, and then use some approximate nearest neighbor (ANN) search algorithm to find top candidates. In this paper, we present Deep Retrieval (DR), to learn a retrievable structure directly with user-item interaction data (e.g. clicks) without resorting to the Euclidean space assumption in ANN algorithms. DR's structure encodes all candidate items into a discrete latent space. Those latent codes for the candidates are model parameters and learnt together with other neural network parameters to maximize the same objective function. With the model learnt, a beam search over the structure is performed to retrieve the top candidates for reranking. Empirically, we first demonstrate that DR, with sub-linear computational complexity, can achieve almost the same accuracy as the brute-force baseline on two public datasets. Moreover, we show that, in a live production recommendation system, a deployed DR approach significantly outperforms a well-tuned ANN baseline in terms of engagement metrics. To the best of our knowledge, DR is among the first non-ANN algorithms successfully deployed at the scale of hundreds of millions of items for industrial recommendation systems.
△ Less
Submitted 18 May, 2021; v1 submitted 12 July, 2020;
originally announced July 2020.
-
Online Binary Space Partitioning Forests
Authors:
Xuhui Fan,
Bin Li,
Scott A. Sisson
Abstract:
The Binary Space Partitioning-Tree~(BSP-Tree) process was recently proposed as an efficient strategy for space partitioning tasks. Because it uses more than one dimension to partition the space, the BSP-Tree Process is more efficient and flexible than conventional axis-aligned cutting strategies. However, due to its batch learning setting, it is not well suited to large-scale classification and re…
▽ More
The Binary Space Partitioning-Tree~(BSP-Tree) process was recently proposed as an efficient strategy for space partitioning tasks. Because it uses more than one dimension to partition the space, the BSP-Tree Process is more efficient and flexible than conventional axis-aligned cutting strategies. However, due to its batch learning setting, it is not well suited to large-scale classification and regression problems. In this paper, we develop an online BSP-Forest framework to address this limitation. With the arrival of new data, the resulting online algorithm can simultaneously expand the space coverage and refine the partition structure, with guaranteed universal consistency for both classification and regression problems. The effectiveness and competitive performance of the online BSP-Forest is verified via simulations on real-world datasets.
△ Less
Submitted 29 February, 2020;
originally announced March 2020.
-
Bayesian Nonparametric Space Partitions: A Survey
Authors:
Xuhui Fan,
Bin Li,
Ling Luo,
Scott A. Sisson
Abstract:
Bayesian nonparametric space partition (BNSP) models provide a variety of strategies for partitioning a $D$-dimensional space into a set of blocks. In this way, the data points lie in the same block would share certain kinds of homogeneity. BNSP models can be applied to various areas, such as regression/classification trees, random feature construction, relational modeling, etc. In this survey, we…
▽ More
Bayesian nonparametric space partition (BNSP) models provide a variety of strategies for partitioning a $D$-dimensional space into a set of blocks. In this way, the data points lie in the same block would share certain kinds of homogeneity. BNSP models can be applied to various areas, such as regression/classification trees, random feature construction, relational modeling, etc. In this survey, we investigate the current progress of BNSP research through the following three perspectives: models, which review various strategies for generating the partitions in the space and discuss their theoretical foundation `self-consistency'; applications, which cover the current mainstream usages of BNSP models and their potential future practises; and challenges, which identify the current unsolved problems and valuable future research topics. As there are no comprehensive reviews of BNSP literature before, we hope that this survey can induce further exploration and exploitation on this topic.
△ Less
Submitted 28 February, 2021; v1 submitted 26 February, 2020;
originally announced February 2020.
-
Supervised Categorical Metric Learning with Schatten p-Norms
Authors:
Xuhui Fan,
Eric Gaussier
Abstract:
Metric learning has been successful in learning new metrics adapted to numerical datasets. However, its development on categorical data still needs further exploration. In this paper, we propose a method, called CPML for \emph{categorical projected metric learning}, that tries to efficiently~(i.e. less computational time and better prediction accuracy) address the problem of metric learning in cat…
▽ More
Metric learning has been successful in learning new metrics adapted to numerical datasets. However, its development on categorical data still needs further exploration. In this paper, we propose a method, called CPML for \emph{categorical projected metric learning}, that tries to efficiently~(i.e. less computational time and better prediction accuracy) address the problem of metric learning in categorical data. We make use of the Value Distance Metric to represent our data and propose new distances based on this representation. We then show how to efficiently learn new metrics. We also generalize several previous regularizers through the Schatten $p$-norm and provides a generalization bound for it that complements the standard generalization bound for metric learning. Experimental results show that our method provides
△ Less
Submitted 25 February, 2020;
originally announced February 2020.
-
Smoothing Graphons for Modelling Exchangeable Relational Data
Authors:
Xuhui Fan,
Yaqiong Li,
Ling Chen,
Bin Li,
Scott A. Sisson
Abstract:
Modelling exchangeable relational data can be described by \textit{graphon theory}. Most Bayesian methods for modelling exchangeable relational data can be attributed to this framework by exploiting different forms of graphons. However, the graphons adopted by existing Bayesian methods are either piecewise-constant functions, which are insufficiently flexible for accurate modelling of the relation…
▽ More
Modelling exchangeable relational data can be described by \textit{graphon theory}. Most Bayesian methods for modelling exchangeable relational data can be attributed to this framework by exploiting different forms of graphons. However, the graphons adopted by existing Bayesian methods are either piecewise-constant functions, which are insufficiently flexible for accurate modelling of the relational data, or are complicated continuous functions, which incur heavy computational costs for inference. In this work, we introduce a smoothing procedure to piecewise-constant graphons to form {\em smoothing graphons}, which permit continuous intensity values for describing relations, but without impractically increasing computational costs. In particular, we focus on the Bayesian Stochastic Block Model (SBM) and demonstrate how to adapt the piecewise-constant SBM graphon to the smoothed version. We initially propose the Integrated Smoothing Graphon (ISG) which introduces one smoothing parameter to the SBM graphon to generate continuous relational intensity values. We then develop the Latent Feature Smoothing Graphon (LFSG), which improves on the ISG by introducing auxiliary hidden labels to decompose the calculation of the ISG intensity and enable efficient inference. Experimental results on real-world data sets validate the advantages of applying smoothing strategies to the Stochastic Block Model, demonstrating that smoothing graphons can greatly improve AUC and precision for link prediction without increasing computational complexity.
△ Less
Submitted 25 February, 2020;
originally announced February 2020.
-
Recurrent Dirichlet Belief Networks for Interpretable Dynamic Relational Data Modelling
Authors:
Yaqiong Li,
Xuhui Fan,
Ling Chen,
Bin Li,
Zheng Yu,
Scott A. Sisson
Abstract:
The Dirichlet Belief Network~(DirBN) has been recently proposed as a promising approach in learning interpretable deep latent representations for objects. In this work, we leverage its interpretable modelling architecture and propose a deep dynamic probabilistic framework -- the Recurrent Dirichlet Belief Network~(Recurrent-DBN) -- to study interpretable hidden structures from dynamic relational d…
▽ More
The Dirichlet Belief Network~(DirBN) has been recently proposed as a promising approach in learning interpretable deep latent representations for objects. In this work, we leverage its interpretable modelling architecture and propose a deep dynamic probabilistic framework -- the Recurrent Dirichlet Belief Network~(Recurrent-DBN) -- to study interpretable hidden structures from dynamic relational data. The proposed Recurrent-DBN has the following merits: (1) it infers interpretable and organised hierarchical latent structures for objects within and across time steps; (2) it enables recurrent long-term temporal dependence modelling, which outperforms the one-order Markov descriptions in most of the dynamic probabilistic frameworks. In addition, we develop a new inference strategy, which first upward-and-backward propagates latent counts and then downward-and-forward samples variables, to enable efficient Gibbs sampling for the Recurrent-DBN. We apply the Recurrent-DBN to dynamic relational data problems. The extensive experiment results on real-world data validate the advantages of the Recurrent-DBN over the state-of-the-art models in interpretable latent structure discovery and improved link prediction performance.
△ Less
Submitted 29 April, 2020; v1 submitted 24 February, 2020;
originally announced February 2020.
-
Fragmentation Coagulation Based Mixed Membership Stochastic Blockmodel
Authors:
Zheng Yu,
Xuhui Fan,
Marcin Pietrasik,
Marek Reformat
Abstract:
The Mixed-Membership Stochastic Blockmodel~(MMSB) is proposed as one of the state-of-the-art Bayesian relational methods suitable for learning the complex hidden structure underlying the network data. However, the current formulation of MMSB suffers from the following two issues: (1), the prior information~(e.g. entities' community structural information) can not be well embedded in the modelling;…
▽ More
The Mixed-Membership Stochastic Blockmodel~(MMSB) is proposed as one of the state-of-the-art Bayesian relational methods suitable for learning the complex hidden structure underlying the network data. However, the current formulation of MMSB suffers from the following two issues: (1), the prior information~(e.g. entities' community structural information) can not be well embedded in the modelling; (2), community evolution can not be well described in the literature. Therefore, we propose a non-parametric fragmentation coagulation based Mixed Membership Stochastic Blockmodel (fcMMSB). Our model performs entity-based clustering to capture the community information for entities and linkage-based clustering to derive the group information for links simultaneously. Besides, the proposed model infers the network structure and models community evolution, manifested by appearances and disappearances of communities, using the discrete fragmentation coagulation process (DFCP). By integrating the community structure with the group compatibility matrix we derive a generalized version of MMSB. An efficient Gibbs sampling scheme with Polya Gamma (PG) approach is implemented for posterior inference. We validate our model on synthetic and real world data.
△ Less
Submitted 17 January, 2020;
originally announced February 2020.
-
Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation
Authors:
Xinjie Fan,
Yizhe Zhang,
Zhendong Wang,
Mingyuan Zhou
Abstract:
Sequence generation models are commonly refined with reinforcement learning over user-defined metrics. However, high gradient variance hinders the practical use of this method. To stabilize this method, we adapt to contextual generation of categorical sequences a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control. Due to the correlation, t…
▽ More
Sequence generation models are commonly refined with reinforcement learning over user-defined metrics. However, high gradient variance hinders the practical use of this method. To stabilize this method, we adapt to contextual generation of categorical sequences a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control. Due to the correlation, the number of unique rollouts is random and adaptive to model uncertainty; those rollouts naturally become baselines for each other, and hence are combined to effectively reduce gradient variance. We also demonstrate the use of correlated MC rollouts for binary-tree softmax models, which reduce the high generation cost in large vocabulary scenarios by decomposing each categorical action into a sequence of binary actions. We evaluate our methods on both neural program synthesis and image captioning. The proposed methods yield lower gradient variance and consistent improvement over related baselines.
△ Less
Submitted 17 June, 2020; v1 submitted 30 December, 2019;
originally announced December 2019.
-
Short-term Load Forecasting with Dense Average Network
Authors:
Zhifang Liao,
Haihui Pan,
Qi Zeng,
Xiaoping Fan,
Yan Zhang,
Song Yu
Abstract:
As an important part of the power system, power load forecasting directly affects the national economy. The data shows that improving the load forecasting accuracy by 0.01% can save millions of dollars for the power industry. Therefore, improving the accuracy of power load forecasting has always been the pursuing goals for a power system. Based on this goal, this paper proposes a novel connection,…
▽ More
As an important part of the power system, power load forecasting directly affects the national economy. The data shows that improving the load forecasting accuracy by 0.01% can save millions of dollars for the power industry. Therefore, improving the accuracy of power load forecasting has always been the pursuing goals for a power system. Based on this goal, this paper proposes a novel connection, the dense average connection, in which the outputs of all preceding layers are averaged as the input of the next layer in a feed-forward fashion. Based on dense average connection , we construct the dense average network for power load forecasting. The predictions of the proposed model for two public datasets are better than those of existing methods. On this basis, we use the ensemble method to further improve the accuracy of the model. To verify the reliability of the model predictions, the robustness is analyzed and verified by adding input disturbances. The experimental results show that the proposed model is effective and robust for power load forecasting.
△ Less
Submitted 28 May, 2020; v1 submitted 8 December, 2019;
originally announced December 2019.
-
Dynamic Connected Neural Decision Classifier and Regressor with Dynamic Softing Pruning
Authors:
Xinyu Fan
Abstract:
To deal with various datasets over different complexity, this paper presents an self-adaptive learning model that combines the proposed Dynamic Connected Neural Decision Networks (DNDN) and a new pruning method--Dynamic Soft Pruning (DSP). DNDN is a combination of random forests and deep neural networks that enjoys both the advantages of strong classification capability of tree-like structure and…
▽ More
To deal with various datasets over different complexity, this paper presents an self-adaptive learning model that combines the proposed Dynamic Connected Neural Decision Networks (DNDN) and a new pruning method--Dynamic Soft Pruning (DSP). DNDN is a combination of random forests and deep neural networks that enjoys both the advantages of strong classification capability of tree-like structure and representation learning capability of network structure. Based on Deep Neural Decision Forests (DNDF), this paper adopts an end-to-end training approach by representing the classification distribution with multiple randomly initialized softmax layers, which further allows an ensemble of multiple random forests attached to layers of neural network with different depth. We also propose a soft pruning method DSP to reduce the redundant connections of the network adaptively to avoid over-fitting simple dataset. The model demonstrates no performance loss compared with unpruned models and even higher robustness over different data and feature distribution. Extensive experiments on different datasets demonstrate the superiority of the proposed model over other popular algorithms in solving classification tasks.
△ Less
Submitted 22 February, 2021; v1 submitted 13 November, 2019;
originally announced November 2019.
-
Regression via Arbitrary Quantile Modeling
Authors:
Faen Zhang,
Xinyu Fan,
Hui Xu,
Pengcheng Zhou,
Yujian He,
Junlong Liu
Abstract:
In the regression problem, L1 and L2 are the most commonly used loss functions, which produce mean predictions with different biases. However, the predictions are neither robust nor adequate enough since they only capture a few conditional distributions instead of the whole distribution, especially for small datasets. To address this problem, we proposed arbitrary quantile modeling to regulate the…
▽ More
In the regression problem, L1 and L2 are the most commonly used loss functions, which produce mean predictions with different biases. However, the predictions are neither robust nor adequate enough since they only capture a few conditional distributions instead of the whole distribution, especially for small datasets. To address this problem, we proposed arbitrary quantile modeling to regulate the prediction, which achieved better performance compared to traditional loss functions. More specifically, a new distribution regression method, Deep Distribution Regression (DDR), is proposed to estimate arbitrary quantiles of the response variable. Our DDR method consists of two models: a Q model, which predicts the corresponding value for arbitrary quantile, and an F model, which predicts the corresponding quantile for arbitrary value. Furthermore, the duality between Q and F models enables us to design a novel loss function for joint training and perform a dual inference mechanism. Our experiments demonstrate that our DDR-joint and DDR-disjoint methods outperform previous methods such as AdaBoost, random forest, LightGBM, and neural networks both in terms of mean and quantile prediction.
△ Less
Submitted 13 November, 2019;
originally announced November 2019.
-
Scalable Deep Generative Relational Models with High-Order Node Dependence
Authors:
Xuhui Fan,
Bin Li,
Scott Anthony Sisson,
Caoyuan Li,
Ling Chen
Abstract:
We propose a probabilistic framework for modelling and exploring the latent structure of relational data. Given feature information for the nodes in a network, the scalable deep generative relational model (SDREM) builds a deep network architecture that can approximate potential nonlinear mappings between nodes' feature information and the nodes' latent representations. Our contribution is two-fol…
▽ More
We propose a probabilistic framework for modelling and exploring the latent structure of relational data. Given feature information for the nodes in a network, the scalable deep generative relational model (SDREM) builds a deep network architecture that can approximate potential nonlinear mappings between nodes' feature information and the nodes' latent representations. Our contribution is two-fold: (1) We incorporate high-order neighbourhood structure information to generate the latent representations at each node, which vary smoothly over the network. (2) Due to the Dirichlet random variable structure of the latent representations, we introduce a novel data augmentation trick which permits efficient Gibbs sampling. The SDREM can be used for large sparse networks as its computational cost scales with the number of positive links. We demonstrate its competitive performance through improved link prediction performance on a range of real-world datasets.
△ Less
Submitted 4 November, 2019;
originally announced November 2019.
-
Scalable Inference for Nonparametric Hawkes Process Using Pólya-Gamma Augmentation
Authors:
Feng Zhou,
Zhidong Li,
Xuhui Fan,
Yang Wang,
Arcot Sowmya,
Fang Chen
Abstract:
In this paper, we consider the sigmoid Gaussian Hawkes process model: the baseline intensity and triggering kernel of Hawkes process are both modeled as the sigmoid transformation of random trajectories drawn from Gaussian processes (GP). By introducing auxiliary latent random variables (branching structure, Pólya-Gamma random variables and latent marked Poisson processes), the likelihood is conve…
▽ More
In this paper, we consider the sigmoid Gaussian Hawkes process model: the baseline intensity and triggering kernel of Hawkes process are both modeled as the sigmoid transformation of random trajectories drawn from Gaussian processes (GP). By introducing auxiliary latent random variables (branching structure, Pólya-Gamma random variables and latent marked Poisson processes), the likelihood is converted to two decoupled components with a Gaussian form which allows for an efficient conjugate analytical inference. Using the augmented likelihood, we derive an expectation-maximization (EM) algorithm to obtain the maximum a posteriori (MAP) estimate. Furthermore, we extend the EM algorithm to an efficient approximate Bayesian inference algorithm: mean-field variational inference. We demonstrate the performance of two algorithms on simulated fictitious data. Experiments on real data show that our proposed inference algorithms can recover well the underlying prompting characteristics efficiently.
△ Less
Submitted 28 October, 2019;
originally announced October 2019.