Skip to main content

Showing 1–50 of 141 results for author: Zhou, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2506.06584  [pdf, ps, other

    cs.LG stat.ML

    Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixtures

    Authors: Mo Zhou, Weihang Xu, Maryam Fazel, Simon S. Du

    Abstract: Learning Gaussian Mixture Models (GMMs) is a fundamental problem in machine learning, with the Expectation-Maximization (EM) algorithm and its popular variant gradient EM being arguably the most widely used algorithms in practice. In the exact-parameterized setting, where both the ground truth GMM and the learning model have the same number of components $m$, a vast line of work has aimed to estab… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: 77 pages

  2. arXiv:2506.00290  [pdf, ps, other

    cs.CL cs.LG stat.ML

    DLM-One: Diffusion Language Models for One-Step Sequence Generation

    Authors: Tianqi Chen, Shujian Zhang, Mingyuan Zhou

    Abstract: This paper introduces DLM-One, a score-distillation-based framework for one-step sequence generation with continuous diffusion language models (DLMs). DLM-One eliminates the need for iterative refinement by aligning the scores of a student model's outputs in the continuous token embedding space with the score function of a pretrained teacher DLM. We investigate whether DLM-One can achieve substant… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  3. arXiv:2505.12674  [pdf, ps, other

    cs.CV cs.LG stat.ML

    Few-Step Diffusion via Score identity Distillation

    Authors: Mingyuan Zhou, Yi Gu, Zhendong Wang

    Abstract: Diffusion distillation has emerged as a promising strategy for accelerating text-to-image (T2I) diffusion models by distilling a pretrained score network into a one- or few-step generator. While existing methods have made notable progress, they often rely on real or teacher-synthesized images to perform well when distilling high-resolution T2I diffusion models such as Stable Diffusion XL (SDXL), a… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  4. arXiv:2505.11444  [pdf, ps, other

    cs.LG stat.AP stat.ME stat.ML

    A Generative Framework for Causal Estimation via Importance-Weighted Diffusion Distillation

    Authors: Xinran Song, Tianyu Chen, Mingyuan Zhou

    Abstract: Estimating individualized treatment effects from observational data is a central challenge in causal inference, largely due to covariate imbalance and confounding bias from non-randomized treatment assignment. While inverse probability weighting (IPW) is a well-established solution to this problem, its integration into modern deep learning frameworks remains limited. In this work, we propose Impor… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  5. arXiv:2504.09654  [pdf, other

    stat.AP

    Integrated Bayesian non-parametric spatial modeling for cross-sample identification of spatially variable genes

    Authors: Meng Zhou, Shuangge Ma, Mengyun Wu

    Abstract: Spatial transcriptomics has revolutionized tissue analysis by simultaneously mapping gene expression, spatial topography, and histological context across consecutive tissue sections, enabling systematic investigation of spatial heterogeneity. The detection of spatially variable (SV) genes, which are molecular signatures with position-dependent expression, provides critical insights into disease me… ▽ More

    Submitted 12 May, 2025; v1 submitted 13 April, 2025; originally announced April 2025.

  6. arXiv:2502.20608  [pdf, other

    stat.ME stat.AP stat.CO

    Modeling times to multiple events under informative censoring with C-vine copula

    Authors: Xinyuan Chen, Yiwei Li, Qian M. Zhou

    Abstract: The study of times to nonterminal events of different types and their interrelation is a compelling area of interest. The primary challenge in analyzing such multivariate event times is the presence of informative censoring by the terminal event. While numerous statistical methods have been proposed for a single nonterminal event, i.e., semi-competing risks data, there remains a dearth of tools fo… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  7. arXiv:2410.09696  [pdf, other

    cs.LG stat.ML

    Scalable Weibull Graph Attention Autoencoder for Modeling Document Networks

    Authors: Chaojie Wang, Xinyang Liu, Dongsheng Wang, Hao Zhang, Bo Chen, Mingyuan Zhou

    Abstract: Although existing variational graph autoencoders (VGAEs) have been widely used for modeling and generating graph-structured data, most of them are still not flexible enough to approximate the sparse and skewed latent node representations, especially those of document relational networks (DRNs) with discrete observations. To analyze a collection of interconnected documents, a typical branch of Baye… ▽ More

    Submitted 12 October, 2024; originally announced October 2024.

    Comments: Submit to T-PAMI

  8. arXiv:2409.16311  [pdf

    physics.ao-ph cs.HC stat.AP

    New Insights into Global Warming: End-to-End Visual Analysis and Prediction of Temperature Variations

    Authors: Meihua Zhou, Nan Wan, Tianlong Zheng, Hanwen Xu, Li Yang, Tingting Wang

    Abstract: Global warming presents an unprecedented challenge to our planet however comprehensive understanding remains hindered by geographical biases temporal limitations and lack of standardization in existing research. An end to end visual analysis of global warming using three distinct temperature datasets is presented. A baseline adjusted from the Paris Agreements one point five degrees Celsius benchma… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

    Comments: 28 pages

  9. arXiv:2406.09357  [pdf, other

    cs.LG stat.ML

    Advancing Graph Generation through Beta Diffusion

    Authors: Xinyang Liu, Yilin He, Bo Chen, Mingyuan Zhou

    Abstract: Diffusion models have excelled in generating natural images and are now being adapted to a variety of data types, including graphs. However, conventional models often rely on Gaussian or categorical diffusion processes, which can struggle to accommodate the mixed discrete and continuous components characteristic of graph data. Graphs typically feature discrete structures and continuous node attrib… ▽ More

    Submitted 5 October, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  10. arXiv:2406.01813  [pdf, other

    stat.ML cs.AI cs.LG stat.AP stat.ME

    Diffusion Boosted Trees

    Authors: Xizewen Han, Mingyuan Zhou

    Abstract: Combining the merits of both denoising diffusion probabilistic models and gradient boosting, the diffusion boosting paradigm is introduced for tackling supervised learning problems. We develop Diffusion Boosted Trees (DBT), which can be viewed as both a new denoising diffusion generative model parameterized by decision trees (one single tree for each diffusion timestep), and a new boosting algorit… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  11. arXiv:2406.01766  [pdf, other

    cs.LG stat.ML

    How Does Gradient Descent Learn Features -- A Local Analysis for Regularized Two-Layer Neural Networks

    Authors: Mo Zhou, Rong Ge

    Abstract: The ability of learning useful features is one of the major advantages of neural networks. Although recent works show that neural network can operate in a neural tangent kernel (NTK) regime that does not allow feature learning, many works also demonstrate the potential for neural networks to go beyond NTK regime and perform feature learning. Recently, a line of work highlighted the feature learnin… ▽ More

    Submitted 4 November, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: NeurIPS 2024 camera ready version

  12. arXiv:2406.01561  [pdf, other

    cs.CV cs.AI cs.CL cs.LG stat.ML

    Guided Score identity Distillation for Data-Free One-Step Text-to-Image Generation

    Authors: Mingyuan Zhou, Zhendong Wang, Huangjie Zheng, Hai Huang

    Abstract: Diffusion-based text-to-image generation models trained on extensive text-image pairs have demonstrated the ability to produce photorealistic images aligned with textual descriptions. However, a significant limitation of these models is their slow sample generation process, which requires iterative refinement through the same network. To overcome this, we introduce a data-free guided distillation… ▽ More

    Submitted 8 February, 2025; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: ICLR 2025; fixed typos in Table 1; Code and model checkpoints available at https://github.com/mingyuanzhou/SiD-LSG; More efficient code using AMP is coming soon

  13. arXiv:2404.04057  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation

    Authors: Mingyuan Zhou, Huangjie Zheng, Zhendong Wang, Mingzhang Yin, Hai Huang

    Abstract: We introduce Score identity Distillation (SiD), an innovative data-free method that distills the generative capabilities of pretrained diffusion models into a single-step generator. SiD not only facilitates an exponentially fast reduction in Fréchet inception distance (FID) during distillation but also approaches or even exceeds the FID performance of the original teacher diffusion models. By refo… ▽ More

    Submitted 24 May, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

    Comments: ICML 2024, PyTorch implementation: https://github.com/mingyuanzhou/SiD

  14. arXiv:2404.00109  [pdf, other

    stat.AP

    Reverse stress testing via multivariate modeling with vine copulas

    Authors: Menglin Zhou, Natalia Nolde

    Abstract: As an important tool in financial risk management, stress testing aims to evaluate the stability of financial portfolios under some potential large shocks from extreme yet plausible scenarios of risk factors. The effectiveness of a stress test crucially depends on the choice of stress scenarios. In this paper we consider a pragmatic approach to stress scenario estimation that aims to address sever… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

    Comments: 35 pages, 23 figures

  15. Two-Stage Pseudo Maximum Likelihood Estimation of Semiparametric Copula-based Regression Models for Semi-Competing Risks Data

    Authors: Sakie J. Arachchige, Xinyuan Chen, Qian M. Zhou

    Abstract: We propose a two-stage estimation procedure for a copula-based model with semi-competing risks data, where the non-terminal event is subject to dependent censoring by the terminal event, and both events are subject to independent censoring. With a copula-based model, the marginal survival functions of individual event times are specified by semiparametric transformation models, and the dependence… ▽ More

    Submitted 25 October, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: 23 pages, 1 figure

    Journal ref: Lifetime Data Analysis (2024)

  16. arXiv:2310.08774  [pdf, other

    q-bio.PE cs.LG stat.ML

    PhyloGFN: Phylogenetic inference with generative flow networks

    Authors: Mingyang Zhou, Zichao Yan, Elliot Layne, Nikolay Malkin, Dinghuai Zhang, Moksh Jain, Mathieu Blanchette, Yoshua Bengio

    Abstract: Phylogenetics is a branch of computational biology that studies the evolutionary relationships among biological entities. Its long history and numerous applications notwithstanding, inference of phylogenetic trees from sequence data remains challenging: the high complexity of tree space poses a significant obstacle for the current combinatorial and probabilistic techniques. In this paper, we adopt… ▽ More

    Submitted 24 March, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

  17. arXiv:2310.06389  [pdf, other

    cs.CV stat.ML

    Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling

    Authors: Huangjie Zheng, Zhendong Wang, Jianbo Yuan, Guanghan Ning, Pengcheng He, Quanzeng You, Hongxia Yang, Mingyuan Zhou

    Abstract: Diffusion models excel at generating photo-realistic images but come with significant computational costs in both training and sampling. While various techniques address these computational challenges, a less-explored issue is designing an efficient and adaptable network backbone for iterative refinement. Current options like U-Net and Vision Transformer often rely on resource-intensive deep netwo… ▽ More

    Submitted 27 June, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

  18. arXiv:2309.07867  [pdf, other

    cs.LG cs.AI stat.CO stat.ME stat.ML

    Beta Diffusion

    Authors: Mingyuan Zhou, Tianqi Chen, Zhendong Wang, Huangjie Zheng

    Abstract: We introduce beta diffusion, a novel generative modeling method that integrates demasking and denoising to generate data within bounded ranges. Using scaled and shifted beta distributions, beta diffusion utilizes multiplicative transitions over time to create both forward and reverse diffusion processes, maintaining beta distributions in both the forward marginals and the reverse conditionals, giv… ▽ More

    Submitted 24 December, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: NeurIPS 2023

  19. arXiv:2305.18375  [pdf, other

    cs.LG stat.ME stat.ML

    Learning to Jump: Thinning and Thickening Latent Counts for Generative Modeling

    Authors: Tianqi Chen, Mingyuan Zhou

    Abstract: Learning to denoise has emerged as a prominent paradigm to design state-of-the-art deep generative models for natural images. How to use it to model the distributions of both continuous real-valued data and categorical data has been well studied in recently proposed diffusion models. However, it is found in this paper to have limited ability in modeling some other types of data, such as count and… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: ICML 2023

  20. arXiv:2305.02499  [pdf, other

    cs.CL cs.AI cs.CV cs.LG stat.ML

    AutoML-GPT: Automatic Machine Learning with GPT

    Authors: Shujian Zhang, Chengyue Gong, Lemeng Wu, Xingchao Liu, Mingyuan Zhou

    Abstract: AI tasks encompass a wide range of domains and fields. While numerous AI models have been designed for specific tasks and applications, they often require considerable human efforts in finding the right model architecture, optimization algorithm, and hyperparameters. Recent advances in large language models (LLMs) like ChatGPT show remarkable capabilities in various aspects of reasoning, comprehen… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

  21. arXiv:2305.00350  [pdf, other

    cs.LG cs.AI cs.CL cs.CV stat.ML

    POUF: Prompt-oriented unsupervised fine-tuning for large pre-trained models

    Authors: Korawat Tanwisuth, Shujian Zhang, Huangjie Zheng, Pengcheng He, Mingyuan Zhou

    Abstract: Through prompting, large-scale pre-trained models have become more expressive and powerful, gaining significant attention in recent years. Though these big models have zero-shot capabilities, in general, labeled data are still required to adapt them to downstream tasks. To overcome this critical limitation, we propose an unsupervised fine-tuning framework to directly fine-tune the model or prompt… ▽ More

    Submitted 29 April, 2023; originally announced May 2023.

    Comments: ICML 2023; PyTorch code is available at https://github.com/korawat-tanwisuth/POUF

  22. arXiv:2302.00257  [pdf, other

    cs.LG stat.ML

    Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression

    Authors: Mo Zhou, Rong Ge

    Abstract: In deep learning, often the training process finds an interpolator (a solution with 0 training loss), but the test loss is still low. This phenomenon, known as benign overfitting, is a major mystery that received a lot of recent attention. One common mechanism for benign overfitting is implicit regularization, where the training process leads to additional properties for the interpolator, often ch… ▽ More

    Submitted 25 May, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: ICML 2023 camera ready version

  23. arXiv:2210.03294  [pdf, other

    cs.LG math.OC stat.ML

    Understanding Edge-of-Stability Training Dynamics with a Minimalist Example

    Authors: Xingyu Zhu, Zixuan Wang, Xiang Wang, Mo Zhou, Rong Ge

    Abstract: Recently, researchers observed that gradient descent for deep neural networks operates in an ``edge-of-stability'' (EoS) regime: the sharpness (maximum eigenvalue of the Hessian) is often larger than stability threshold $2/η$ (where $η$ is the step size). Despite this, the loss oscillates and converges in the long run, and the sharpness at the end is just slightly below $2/η$. While many other wel… ▽ More

    Submitted 21 February, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: 53 pages, 19 figures

    ACM Class: I.2.6

  24. arXiv:2210.01019  [pdf, other

    stat.ML cs.LG

    Plateau in Monotonic Linear Interpolation -- A "Biased" View of Loss Landscape for Deep Networks

    Authors: Xiang Wang, Annie N. Wang, Mo Zhou, Rong Ge

    Abstract: Monotonic linear interpolation (MLI) - on the line connecting a random initialization with the minimizer it converges to, the loss and accuracy are monotonic - is a phenomenon that is commonly observed in the training of neural networks. Such a phenomenon may seem to suggest that optimization of neural networks is easy. In this paper, we show that the MLI property is not necessarily related to the… ▽ More

    Submitted 14 February, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: ICLR 2023

  25. arXiv:2208.06193  [pdf, other

    cs.LG stat.ML

    Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

    Authors: Zhendong Wang, Jonathan J Hunt, Mingyuan Zhou

    Abstract: Offline reinforcement learning (RL), which aims to learn an optimal policy using a previously collected static dataset, is an important paradigm of RL. Standard RL methods often perform poorly in this regime due to the function approximation errors on out-of-distribution actions. While a variety of regularization methods have been proposed to mitigate this issue, they are often constrained by poli… ▽ More

    Submitted 25 August, 2023; v1 submitted 12 August, 2022; originally announced August 2022.

    Comments: ICLR 2023

  26. arXiv:2206.07275  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    CARD: Classification and Regression Diffusion Models

    Authors: Xizewen Han, Huangjie Zheng, Mingyuan Zhou

    Abstract: Learning the distribution of a continuous or categorical response variable $\boldsymbol y$ given its covariates $\boldsymbol x$ is a fundamental problem in statistics and machine learning. Deep neural network-based supervised learning algorithms have made great progress in predicting the mean of $\boldsymbol y$ given $\boldsymbol x$, but they are often criticized for their ability to accurately ca… ▽ More

    Submitted 6 December, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022

  27. arXiv:2206.06584  [pdf, other

    stat.ML cs.LG stat.ME

    Probabilistic Conformal Prediction Using Conditional Random Samples

    Authors: Zhendong Wang, Ruijiang Gao, Mingzhang Yin, Mingyuan Zhou, David M. Blei

    Abstract: This paper proposes probabilistic conformal prediction (PCP), a predictive inference algorithm that estimates a target variable by a discontinuous predictive set. Given inputs, PCP construct the predictive set based on random samples from an estimated generative model. It is efficient and compatible with either explicit or implicit conditional generative models. Theoretically, we show that PCP gua… ▽ More

    Submitted 20 June, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

  28. arXiv:2206.02262  [pdf, other

    cs.LG stat.ML

    Diffusion-GAN: Training GANs with Diffusion

    Authors: Zhendong Wang, Huangjie Zheng, Pengcheng He, Weizhu Chen, Mingyuan Zhou

    Abstract: Generative adversarial networks (GANs) are challenging to train stably, and a promising remedy of injecting instance noise into the discriminator input has not been very effective in practice. In this paper, we propose Diffusion-GAN, a novel GAN framework that leverages a forward diffusion chain to generate Gaussian-mixture distributed instance noise. Diffusion-GAN consists of three components, in… ▽ More

    Submitted 25 August, 2023; v1 submitted 5 June, 2022; originally announced June 2022.

    Comments: Project homepage: https://github.com/Zhendong-Wang/Diffusion-GAN; ICLR 2023 camera ready version

  29. arXiv:2203.03539  [pdf, other

    cs.CL cs.LG stat.ML

    Understanding The Robustness of Self-supervised Learning Through Topic Modeling

    Authors: Zeping Luo, Shiyou Wu, Cindy Weng, Mo Zhou, Rong Ge

    Abstract: Self-supervised learning has significantly improved the performance of many NLP tasks. However, how can self-supervised learning discover useful representations, and why is it better than traditional approaches such as probabilistic models are still largely unknown. In this paper, we focus on the context of topic modeling and highlight a key advantage of self-supervised learning - when applied to… ▽ More

    Submitted 27 February, 2023; v1 submitted 2 February, 2022; originally announced March 2022.

    Comments: Accepted at ICLR 2023. Camera ready version

  30. arXiv:2203.01570  [pdf, other

    cs.LG stat.ME stat.ML

    Representing Mixtures of Word Embeddings with Mixtures of Topic Embeddings

    Authors: Dongsheng Wang, Dandan Guo, He Zhao, Huangjie Zheng, Korawat Tanwisuth, Bo Chen, Mingyuan Zhou

    Abstract: A topic model is often formulated as a generative model that explains how each word of a document is generated given a set of topics and document-specific topic proportions. It is focused on capturing the word co-occurrences in a document and hence often suffers from poor performance in analyzing short documents. In addition, its parameter estimation often relies on approximate posterior inference… ▽ More

    Submitted 14 March, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: Proceedings of ICLR, 2022

  31. A density peaks clustering algorithm with sparse search and K-d tree

    Authors: Yunxiao Shan, Shu Li, Fuxiang Li, Yuxin Cui, Shuai Li, Ming Zhou, Xiang Li

    Abstract: Density peaks clustering has become a nova of clustering algorithm because of its simplicity and practicality. However, there is one main drawback: it is time-consuming due to its high computational complexity. Herein, a density peaks clustering algorithm with sparse search and K-d tree is developed to solve this problem. Firstly, a sparse distance matrix is calculated by using K-d tree to replace… ▽ More

    Submitted 19 July, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

    Comments: IEEE ACCESS

  32. arXiv:2202.11735  [pdf, other

    stat.ML cs.LG math.ST

    Truncated LinUCB for Stochastic Linear Bandits

    Authors: Yanglei Song, Meng zhou

    Abstract: This paper considers contextual bandits with a finite number of arms, where the contexts are independent and identically distributed $d$-dimensional random vectors, and the expected rewards are linear in both the arm parameters and contexts. The LinUCB algorithm, which is near minimax optimal for related linear bandits, is shown to have a cumulative regret that is suboptimal in both the dimension… ▽ More

    Submitted 5 May, 2025; v1 submitted 23 February, 2022; originally announced February 2022.

  33. arXiv:2202.09673  [pdf, other

    stat.ML cs.LG

    A Behavior Regularized Implicit Policy for Offline Reinforcement Learning

    Authors: Shentao Yang, Zhendong Wang, Huangjie Zheng, Yihao Feng, Mingyuan Zhou

    Abstract: Offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment. The lack of environmental interactions makes the policy training vulnerable to state-action pairs far from the training dataset and prone to missing rewarding actions. For training more effective agents, we propose a framework that supports learning a flexible yet well-regulariz… ▽ More

    Submitted 7 October, 2022; v1 submitted 19 February, 2022; originally announced February 2022.

    Comments: 33 pages, 3 figures, and 8 tables

  34. arXiv:2202.09671  [pdf, other

    stat.ML cs.LG

    Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Auto-Encoders

    Authors: Huangjie Zheng, Pengcheng He, Weizhu Chen, Mingyuan Zhou

    Abstract: Employing a forward diffusion chain to gradually map the data to a noise distribution, diffusion-based generative models learn how to generate the data by inferring a reverse diffusion chain. However, this approach is slow and costly because it needs many forward and reverse steps. We propose a faster and cheaper approach that adds noise not until the data become pure random noise, but until they… ▽ More

    Submitted 7 September, 2023; v1 submitted 19 February, 2022; originally announced February 2022.

    Comments: ICLR 2023 camera-ready version

  35. arXiv:2202.03233  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    A Variational Edge Partition Model for Supervised Graph Representation Learning

    Authors: Yilin He, Chaojie Wang, Hao Zhang, Bo Chen, Mingyuan Zhou

    Abstract: Graph neural networks (GNNs), which propagate the node features through the edges and learn how to transform the aggregated features under label supervision, have achieved great success in supervised feature extraction for both node-level and graph-level classification tasks. However, GNNs typically treat the graph structure as given and ignore how the edges are formed. This paper introduces a gra… ▽ More

    Submitted 31 October, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

    Comments: 10 pages, 5 figures, 14 pages of appendix, accepted to NeurIPS 2022

  36. arXiv:2112.07823  [pdf, other

    cs.LG stat.ML

    Bayesian Graph Contrastive Learning

    Authors: Arman Hasanzadeh, Mohammadreza Armandpour, Ehsan Hajiramezanali, Mingyuan Zhou, Nick Duffield, Krishna Narayanan

    Abstract: Contrastive learning has become a key component of self-supervised learning approaches for graph-structured data. Despite their success, existing graph contrastive learning methods are incapable of uncertainty quantification for node representations or their downstream tasks, limiting their application in high-stakes domains. In this paper, we propose a novel Bayesian perspective of graph contrast… ▽ More

    Submitted 28 August, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

  37. arXiv:2110.14002  [pdf, other

    cs.LG stat.ML

    CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator

    Authors: Alek Dimitriev, Mingyuan Zhou

    Abstract: Accurately backpropagating the gradient through categorical variables is a challenging task that arises in various domains, such as training discrete latent variable models. To this end, we propose CARMS, an unbiased estimator for categorical random variables based on multiple mutually negatively correlated (jointly antithetic) samples. CARMS combines REINFORCE with copula based sampling to avoid… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

  38. arXiv:2110.12567  [pdf, other

    cs.LG cs.CL stat.ML

    Alignment Attention by Matching Key and Query Distributions

    Authors: Shujian Zhang, Xinjie Fan, Huangjie Zheng, Korawat Tanwisuth, Mingyuan Zhou

    Abstract: The neural attention mechanism has been incorporated into deep neural networks to achieve state-of-the-art performance in various domains. Most such models use multi-head self-attention which is appealing for the ability to attend to information from different perspectives. This paper introduces alignment attention that explicitly encourages self-attention to match the distributions of the key and… ▽ More

    Submitted 24 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021; Our code is publicly available at https://github.com/szhang42/alignment_attention

  39. arXiv:2110.12024  [pdf, other

    cs.LG cs.CV stat.ML

    A Prototype-Oriented Framework for Unsupervised Domain Adaptation

    Authors: Korawat Tanwisuth, Xinjie Fan, Huangjie Zheng, Shujian Zhang, Hao Zhang, Bo Chen, Mingyuan Zhou

    Abstract: Existing methods for unsupervised domain adaptation often rely on minimizing some statistical distance between the source and target samples in the latent space. To avoid the sampling variability, class imbalance, and data-privacy concerns that often plague these methods, we instead provide a memory and computation-efficient probabilistic framework to extract class prototypes and align the target… ▽ More

    Submitted 22 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021

  40. arXiv:2110.09140  [pdf, other

    cs.LG stat.ML

    Learning Prototype-oriented Set Representations for Meta-Learning

    Authors: Dandan Guo, Long Tian, Minghe Zhang, Mingyuan Zhou, Hongyuan Zha

    Abstract: Learning from set-structured data is a fundamental problem that has recently attracted increasing attention, where a series of summary networks are introduced to deal with the set input. In fact, many meta-learning problems can be treated as set-input tasks. Most existing summary networks aim to design different architectures for the input set in order to enforce permutation invariance. However, s… ▽ More

    Submitted 7 March, 2023; v1 submitted 18 October, 2021; originally announced October 2021.

  41. arXiv:2109.09782  [pdf, ps, other

    stat.ME

    Information matrix equivalence in the presence of censoring: A goodness-of-fit test for semiparametric copula models with multivariate survival data

    Authors: Qian M. Zhou

    Abstract: Various goodness-of-fit tests are designed based on the so-called information matrix equivalence: if the assumed model is correctly specified, two information matrices that are derived from the likelihood function are equivalent. In the literature, this principle has been established for the likelihood function with fully observed data, but it has not been verified under the likelihood for censore… ▽ More

    Submitted 23 July, 2023; v1 submitted 20 September, 2021; originally announced September 2021.

    Comments: 39 pages, 5 figures

  42. arXiv:2108.06072  [pdf, other

    physics.chem-ph stat.AP

    Efficient force field and energy emulation through partition of permutationally equivalent atoms

    Authors: Hao Li, Musen Zhou, Jessalyn Sebastian, Jianzhong Wu, Mengyang Gu

    Abstract: Gaussian process (GP) emulator has been used as a surrogate model for predicting force field and molecular potential, to overcome the computational bottleneck of molecular dynamics simulation. Integrating both atomic force and energy in predictions was found to be more accurate than using energy alone, yet it requires $O((NM)^3)$ computational operations for computing the likelihood function and m… ▽ More

    Submitted 9 May, 2022; v1 submitted 13 August, 2021; originally announced August 2021.

  43. arXiv:2108.05839  [pdf, ps, other

    cs.LG cs.AI cs.CV stat.ML

    Logit Attenuating Weight Normalization

    Authors: Aman Gupta, Rohan Ramanath, Jun Shi, Anika Ramachandran, Sirou Zhou, Mingzhou Zhou, S. Sathiya Keerthi

    Abstract: Over-parameterized deep networks trained using gradient-based optimizers are a popular choice for solving classification and ranking problems. Without appropriately tuned $\ell_2$ regularization or weight decay, such networks have the tendency to make output scores (logits) and network weights large, causing training loss to become too small and the network to lose its adaptivity (ability to move… ▽ More

    Submitted 12 August, 2021; originally announced August 2021.

    Comments: 23 pages

  44. arXiv:2106.06573  [pdf, ps, other

    stat.ML cs.LG

    Understanding Deflation Process in Over-parametrized Tensor Decomposition

    Authors: Rong Ge, Yunwei Ren, Xiang Wang, Mo Zhou

    Abstract: In this paper we study the training dynamics for gradient flow on over-parametrized tensor decomposition problems. Empirically, such training process often first fits larger components and then discovers smaller components, which is similar to a tensor deflation process that is commonly used in tensor decomposition algorithms. We prove that for orthogonally decomposable tensor, a slightly modified… ▽ More

    Submitted 24 October, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021 Camera Ready

  45. arXiv:2106.05251  [pdf, other

    cs.LG cs.CL stat.ML

    Bayesian Attention Belief Networks

    Authors: Shujian Zhang, Xinjie Fan, Bo Chen, Mingyuan Zhou

    Abstract: Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks. Most such models use deterministic attention while stochastic attention is less explored due to the optimization difficulties or complicated model design. This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights with a hierar… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: ICML 2021

  46. arXiv:2105.14141  [pdf, other

    cs.LG stat.ML

    ARMS: Antithetic-REINFORCE-Multi-Sample Gradient for Binary Variables

    Authors: Alek Dimitriev, Mingyuan Zhou

    Abstract: Estimating the gradients for binary variables is a task that arises frequently in various domains, such as training discrete latent variable models. What has been commonly used is a REINFORCE based Monte Carlo estimation method that uses either independent samples or pairs of negatively correlated samples. To better utilize more than two samples, we propose ARMS, an Antithetic REINFORCE-based Mult… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

  47. Matching Visual Features to Hierarchical Semantic Topics for Image Paragraph Captioning

    Authors: Dandan Guo, Ruiying Lu, Bo Chen, Zequn Zeng, Mingyuan Zhou

    Abstract: Observing a set of images and their corresponding paragraph-captions, a challenging task is to learn how to produce a semantically coherent paragraph to describe the visual content of an image. Inspired by recent successes in integrating semantic topics into this task, this paper develops a plug-and-play hierarchical-topic-guided image paragraph generation framework, which couples a visual extract… ▽ More

    Submitted 25 July, 2022; v1 submitted 10 May, 2021; originally announced May 2021.

  48. arXiv:2105.03746  [pdf, other

    cs.LG cs.CV stat.ML

    Contrastive Attraction and Contrastive Repulsion for Representation Learning

    Authors: Huangjie Zheng, Xu Chen, Jiangchao Yao, Hongxia Yang, Chunyuan Li, Ya Zhang, Hao Zhang, Ivor Tsang, Jingren Zhou, Mingyuan Zhou

    Abstract: Contrastive learning (CL) methods effectively learn data representations in a self-supervision manner, where the encoder contrasts each positive sample over multiple negative samples via a one-vs-many softmax cross-entropy loss. By leveraging large amounts of unlabeled image data, recent CL methods have achieved promising results when pretrained on large-scale datasets, such as ImageNet. However,… ▽ More

    Submitted 11 August, 2023; v1 submitted 8 May, 2021; originally announced May 2021.

    Journal ref: Transactions on Machine Learning Research, 2023. ISSN 2835-8856

  49. arXiv:2104.02665  [pdf, other

    stat.ME

    A new weighting method when not all the events are selected as cases in a nested case-control study

    Authors: Qian M. Zhou, Xuan Wang, Yingye Zheng, Tianxi Cai

    Abstract: Nested case-control (NCC) is a sampling method widely used for developing and evaluating risk models with expensive biomarkers on large prospective cohort studies. The biomarker values are typically obtained on a sub-cohort, consisting of all the events and a subset of non-events. However, when the number of events is not small, it might not be affordable to measure the biomarkers on all of them.… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: 27 pages,3 figures, 5 tables

  50. arXiv:2103.10060  [pdf, other

    cs.LG stat.ML

    Approximating Probability Distributions by using Wasserstein Generative Adversarial Networks

    Authors: Yihang Gao, Michael K. Ng, Mingjie Zhou

    Abstract: Studied here are Wasserstein generative adversarial networks (WGANs) with GroupSort neural networks as their discriminators. It is shown that the error bound of the approximation for the target distribution depends on the width and depth (capacity) of the generators and discriminators and the number of samples in training. A quantified generalization bound is established for the Wasserstein distan… ▽ More

    Submitted 29 June, 2023; v1 submitted 18 March, 2021; originally announced March 2021.

    Comments: Accepted by SIAM Journal on Mathematics of Data Science (SIMODS)

    MSC Class: 68Q32; 68T15; 68W40