-
Covariates-Adjusted Mixed-Membership Estimation: A Novel Network Model with Optimal Guarantees
Authors:
Jianqing Fan,
Jiawei Ge,
Jikai Hou
Abstract:
This paper addresses the problem of mixed-membership estimation in networks, where the goal is to efficiently estimate the latent mixed-membership structure from the observed network. Recognizing the widespread availability and valuable information carried by node covariates, we propose a novel network model that incorporates both community information, as represented by the Degree-Corrected Mixed…
▽ More
This paper addresses the problem of mixed-membership estimation in networks, where the goal is to efficiently estimate the latent mixed-membership structure from the observed network. Recognizing the widespread availability and valuable information carried by node covariates, we propose a novel network model that incorporates both community information, as represented by the Degree-Corrected Mixed Membership (DCMM) model, and node covariate similarities to determine connections.
We investigate the regularized maximum likelihood estimation (MLE) for this model and demonstrate that our approach achieves optimal estimation accuracy for both the similarity matrix and the mixed-membership, in terms of both the Frobenius norm and the entrywise loss. Since directly analyzing the original convex optimization problem is intractable, we employ nonconvex optimization to facilitate the analysis. A key contribution of our work is identifying a crucial assumption that bridges the gap between convex and nonconvex solutions, enabling the transfer of statistical guarantees from the nonconvex approach to its convex counterpart. Importantly, our analysis extends beyond the MLE loss and the mean squared error (MSE) used in matrix completion problems, generalizing to all the convex loss functions. Consequently, our analysis techniques extend to a broader set of applications, including ranking problems based on pairwise comparisons.
Finally, simulation experiments validate our theoretical findings, and real-world data analyses confirm the practical relevance of our model.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
Adaptive radar detection of subspace-based distributed target in power heterogeneous clutter
Authors:
Daipeng Xiao,
Weijian Liu,
Jun Liu,
Lingyan Dai,
Xueli Fang,
Jianjun Ge
Abstract:
This paper investigates the problem of adaptive detection of distributed targets in power heterogeneous clutter. In the considered scenario, all the data share the identical structure of clutter covariance matrix, but with varying and unknown power mismatches. To address this problem, we iteratively estimate all the unknowns, including the coordinate matrix of the target, the clutter covariance ma…
▽ More
This paper investigates the problem of adaptive detection of distributed targets in power heterogeneous clutter. In the considered scenario, all the data share the identical structure of clutter covariance matrix, but with varying and unknown power mismatches. To address this problem, we iteratively estimate all the unknowns, including the coordinate matrix of the target, the clutter covariance matrix, and the corresponding power mismatches, and propose three detectors based on the generalized likelihood ratio test (GLRT), Rao and the Wald tests. The results from simulated and real data both illustrate that the detectors based on GLRT and Rao test have higher probabilities of detection (PDs) than the existing competitors. Among them, the Rao test-based detector exhibits the best overall detection performance. We also analyze the impact of the target extended dimensions, the signal subspace dimensions, and the number of training samples on the detection performance. Furthermore, simulation experiments also demonstrate that the proposed detectors have a constant false alarm rate (CFAR) property for the structure of clutter covariance matrix.
△ Less
Submitted 9 October, 2024; v1 submitted 21 September, 2024;
originally announced September 2024.
-
Securing Equal Share: A Principled Approach for Learning Multiplayer Symmetric Games
Authors:
Jiawei Ge,
Yuanhao Wang,
Wenzhe Li,
Chi Jin
Abstract:
This paper examines multiplayer symmetric constant-sum games with more than two players in a competitive setting, including examples like Mahjong, Poker, and various board and video games. In contrast to two-player zero-sum games, equilibria in multiplayer games are neither unique nor non-exploitable, failing to provide meaningful guarantees when competing against opponents who play different equi…
▽ More
This paper examines multiplayer symmetric constant-sum games with more than two players in a competitive setting, including examples like Mahjong, Poker, and various board and video games. In contrast to two-player zero-sum games, equilibria in multiplayer games are neither unique nor non-exploitable, failing to provide meaningful guarantees when competing against opponents who play different equilibria or non-equilibrium strategies. This gives rise to a series of long-lasting fundamental questions in multiplayer games regarding suitable objectives, solution concepts, and principled algorithms. This paper takes an initial step towards addressing these challenges by focusing on the natural objective of equal share -- securing an expected payoff of C/n in an n-player symmetric game with a total payoff of C. We rigorously identify the theoretical conditions under which achieving an equal share is tractable and design a series of efficient algorithms, inspired by no-regret learning, that provably attain approximate equal share across various settings. Furthermore, we provide complementary lower bounds that justify the sharpness of our theoretical results. Our experimental results highlight worst-case scenarios where meta-algorithms from prior state-of-the-art systems for multiplayer games fail to secure an equal share, while our algorithm succeeds, demonstrating the effectiveness of our approach.
△ Less
Submitted 2 October, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
Optimal Aggregation of Prediction Intervals under Unsupervised Domain Shift
Authors:
Jiawei Ge,
Debarghya Mukherjee,
Jianqing Fan
Abstract:
As machine learning models are increasingly deployed in dynamic environments, it becomes paramount to assess and quantify uncertainties associated with distribution shifts. A distribution shift occurs when the underlying data-generating process changes, leading to a deviation in the model's performance. The prediction interval, which captures the range of likely outcomes for a given prediction, se…
▽ More
As machine learning models are increasingly deployed in dynamic environments, it becomes paramount to assess and quantify uncertainties associated with distribution shifts. A distribution shift occurs when the underlying data-generating process changes, leading to a deviation in the model's performance. The prediction interval, which captures the range of likely outcomes for a given prediction, serves as a crucial tool for characterizing uncertainties induced by their underlying distribution. In this paper, we propose methodologies for aggregating prediction intervals to obtain one with minimal width and adequate coverage on the target domain under unsupervised domain shift, under which we have labeled samples from a related source domain and unlabeled covariates from the target domain. Our analysis encompasses scenarios where the source and the target domain are related via i) a bounded density ratio, and ii) a measure-preserving transformation. Our proposed methodologies are computationally efficient and easy to implement. Beyond illustrating the performance of our method through real-world datasets, we also delve into the theoretical details. This includes establishing rigorous theoretical guarantees, coupled with finite sample bounds, regarding the coverage and width of our prediction intervals. Our approach excels in practical applications and is underpinned by a solid theoretical framework, ensuring its reliability and effectiveness across diverse contexts.
△ Less
Submitted 7 October, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
Maximum Likelihood Estimation is All You Need for Well-Specified Covariate Shift
Authors:
Jiawei Ge,
Shange Tang,
Jianqing Fan,
Cong Ma,
Chi Jin
Abstract:
A key challenge of modern machine learning systems is to achieve Out-of-Distribution (OOD) generalization -- generalizing to target data whose distribution differs from that of source data. Despite its significant importance, the fundamental question of ``what are the most effective algorithms for OOD generalization'' remains open even under the standard setting of covariate shift. This paper addr…
▽ More
A key challenge of modern machine learning systems is to achieve Out-of-Distribution (OOD) generalization -- generalizing to target data whose distribution differs from that of source data. Despite its significant importance, the fundamental question of ``what are the most effective algorithms for OOD generalization'' remains open even under the standard setting of covariate shift. This paper addresses this fundamental question by proving that, surprisingly, classical Maximum Likelihood Estimation (MLE) purely using source data (without any modification) achieves the minimax optimality for covariate shift under the well-specified setting. That is, no algorithm performs better than MLE in this setting (up to a constant factor), justifying MLE is all you need. Our result holds for a very rich class of parametric models, and does not require any boundedness condition on the density ratio. We illustrate the wide applicability of our framework by instantiating it to three concrete examples -- linear regression, logistic regression, and phase retrieval. This paper further complement the study by proving that, under the misspecified setting, MLE is no longer the optimal choice, whereas Maximum Weighted Likelihood Estimator (MWLE) emerges as minimax optimal in certain scenarios.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
UTOPIA: Universally Trainable Optimal Prediction Intervals Aggregation
Authors:
Jianqing Fan,
Jiawei Ge,
Debarghya Mukherjee
Abstract:
Uncertainty quantification in prediction presents a compelling challenge with vast applications across various domains, including biomedical science, economics, and weather forecasting. There exists a wide array of methods for constructing prediction intervals, such as quantile regression and conformal prediction. However, practitioners often face the challenge of selecting the most suitable metho…
▽ More
Uncertainty quantification in prediction presents a compelling challenge with vast applications across various domains, including biomedical science, economics, and weather forecasting. There exists a wide array of methods for constructing prediction intervals, such as quantile regression and conformal prediction. However, practitioners often face the challenge of selecting the most suitable method for a specific real-world data problem. In response to this dilemma, we introduce a novel and universally applicable strategy called Universally Trainable Optimal Predictive Intervals Aggregation (UTOPIA). This technique excels in efficiently aggregating multiple prediction intervals while maintaining a small average width of the prediction band and ensuring coverage. UTOPIA is grounded in linear or convex programming, making it straightforward to train and implement. In the specific case where the prediction methods are elementary basis functions, as in kernel and spline bases, our method becomes the construction of a prediction band. Our proposed methodologies are supported by theoretical guarantees on the coverage probability and the average width of the aggregated prediction interval, which are detailed in this paper. The practicality and effectiveness of UTOPIA are further validated through its application to synthetic data and two real-world datasets in finance and macroeconomics.
△ Less
Submitted 13 July, 2024; v1 submitted 28 June, 2023;
originally announced June 2023.
-
On the Provable Advantage of Unsupervised Pretraining
Authors:
Jiawei Ge,
Shange Tang,
Jianqing Fan,
Chi Jin
Abstract:
Unsupervised pretraining, which learns a useful representation using a large amount of unlabeled data to facilitate the learning of downstream tasks, is a critical component of modern large-scale machine learning systems. Despite its tremendous empirical success, the rigorous theoretical understanding of why unsupervised pretraining generally helps remains rather limited -- most existing results a…
▽ More
Unsupervised pretraining, which learns a useful representation using a large amount of unlabeled data to facilitate the learning of downstream tasks, is a critical component of modern large-scale machine learning systems. Despite its tremendous empirical success, the rigorous theoretical understanding of why unsupervised pretraining generally helps remains rather limited -- most existing results are restricted to particular methods or approaches for unsupervised pretraining with specialized structural assumptions. This paper studies a generic framework, where the unsupervised representation learning task is specified by an abstract class of latent variable models $Φ$ and the downstream task is specified by a class of prediction functions $Ψ$. We consider a natural approach of using Maximum Likelihood Estimation (MLE) for unsupervised pretraining and Empirical Risk Minimization (ERM) for learning downstream tasks. We prove that, under a mild ''informative'' condition, our algorithm achieves an excess risk of $\tilde{\mathcal{O}}(\sqrt{\mathcal{C}_Φ/m} + \sqrt{\mathcal{C}_Ψ/n})$ for downstream tasks, where $\mathcal{C}_Φ, \mathcal{C}_Ψ$ are complexity measures of function classes $Φ, Ψ$, and $m, n$ are the number of unlabeled and labeled data respectively. Comparing to the baseline of $\tilde{\mathcal{O}}(\sqrt{\mathcal{C}_{Φ\circ Ψ}/n})$ achieved by performing supervised learning using only the labeled data, our result rigorously shows the benefit of unsupervised pretraining when $m \gg n$ and $\mathcal{C}_{Φ\circ Ψ} > \mathcal{C}_Ψ$. This paper further shows that our generic framework covers a wide range of approaches for unsupervised pretraining, including factor models, Gaussian mixture models, and contrastive learning.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
Weakest link pruning of a dendrogram
Authors:
Jiacheng Ge,
Robert Tibshirani
Abstract:
Hierarchical clustering is a popular method for identifying distinct groups in a dataset. The most commonly used method for pruning a dendrogram is via a single horizontal cut. In this paper, we propose a new technique "weakest link optimal pruning". We prove its superiority over horizontal pruning and provide some examples illustrating how the two methods can behave quite differently.
Hierarchical clustering is a popular method for identifying distinct groups in a dataset. The most commonly used method for pruning a dendrogram is via a single horizontal cut. In this paper, we propose a new technique "weakest link optimal pruning". We prove its superiority over horizontal pruning and provide some examples illustrating how the two methods can behave quite differently.
△ Less
Submitted 18 January, 2023; v1 submitted 10 December, 2022;
originally announced December 2022.
-
Semi-supervised Collaborative Filtering by Text-enhanced Domain Adaptation
Authors:
Wenhui Yu,
Xiao Lin,
Junfeng Ge,
Wenwu Ou,
Zheng Qin
Abstract:
Data sparsity is an inherent challenge in the recommender systems, where most of the data is collected from the implicit feedbacks of users. This causes two difficulties in designing effective algorithms: first, the majority of users only have a few interactions with the system and there is no enough data for learning; second, there are no negative samples in the implicit feedbacks and it is a com…
▽ More
Data sparsity is an inherent challenge in the recommender systems, where most of the data is collected from the implicit feedbacks of users. This causes two difficulties in designing effective algorithms: first, the majority of users only have a few interactions with the system and there is no enough data for learning; second, there are no negative samples in the implicit feedbacks and it is a common practice to perform negative sampling to generate negative samples. However, this leads to a consequence that many potential positive samples are mislabeled as negative ones and data sparsity would exacerbate the mislabeling problem. To solve these difficulties, we regard the problem of recommendation on sparse implicit feedbacks as a semi-supervised learning task, and explore domain adaption to solve it. We transfer the knowledge learned from dense data to sparse data and we focus on the most challenging case -- there is no user or item overlap. In this extreme case, aligning embeddings of two datasets directly is rather sub-optimal since the two latent spaces encode very different information. As such, we adopt domain-invariant textual features as the anchor points to align the latent spaces. To align the embeddings, we extract the textual features for each user and item and feed them into a domain classifier with the embeddings of users and items. The embeddings are trained to puzzle the classifier and textual features are fixed as anchor points. By domain adaptation, the distribution pattern in the source domain is transferred to the target domain. As the target part can be supervised by domain adaptation, we abandon negative sampling in target dataset to avoid label noise. We adopt three pairs of real-world datasets to validate the effectiveness of our transfer strategy. Results show that our models outperform existing models significantly.
△ Less
Submitted 28 June, 2020;
originally announced July 2020.
-
Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python
Authors:
Jason Ge,
Xingguo Li,
Haoming Jiang,
Han Liu,
Tong Zhang,
Mengdi Wang,
Tuo Zhao
Abstract:
We describe a new library named picasso, which implements a unified framework of pathwise coordinate optimization for a variety of sparse learning problems (e.g., sparse linear regression, sparse logistic regression, sparse Poisson regression and scaled sparse linear regression) combined with efficient active set selection strategies. Besides, the library allows users to choose different sparsity-…
▽ More
We describe a new library named picasso, which implements a unified framework of pathwise coordinate optimization for a variety of sparse learning problems (e.g., sparse linear regression, sparse logistic regression, sparse Poisson regression and scaled sparse linear regression) combined with efficient active set selection strategies. Besides, the library allows users to choose different sparsity-inducing regularizers, including the convex $\ell_1$, nonconvex MCP and SCAD regularizers. The library is coded in C++ and has user-friendly R and Python wrappers. Numerical experiments demonstrate that picasso can scale up to large problems efficiently.
△ Less
Submitted 26 June, 2020;
originally announced June 2020.
-
On Quadratic Convergence of DC Proximal Newton Algorithm for Nonconvex Sparse Learning in High Dimensions
Authors:
Xingguo Li,
Lin F. Yang,
Jason Ge,
Jarvis Haupt,
Tong Zhang,
Tuo Zhao
Abstract:
We propose a DC proximal Newton algorithm for solving nonconvex regularized sparse learning problems in high dimensions. Our proposed algorithm integrates the proximal Newton algorithm with multi-stage convex relaxation based on the difference of convex (DC) programming, and enjoys both strong computational and statistical guarantees. Specifically, by leveraging a sophisticated characterization of…
▽ More
We propose a DC proximal Newton algorithm for solving nonconvex regularized sparse learning problems in high dimensions. Our proposed algorithm integrates the proximal Newton algorithm with multi-stage convex relaxation based on the difference of convex (DC) programming, and enjoys both strong computational and statistical guarantees. Specifically, by leveraging a sophisticated characterization of sparse modeling structures/assumptions (i.e., local restricted strong convexity and Hessian smoothness), we prove that within each stage of convex relaxation, our proposed algorithm achieves (local) quadratic convergence, and eventually obtains a sparse approximate local optimum with optimal statistical properties after only a few convex relaxations. Numerical experiments are provided to support our theory.
△ Less
Submitted 15 February, 2018; v1 submitted 19 June, 2017;
originally announced June 2017.