Skip to main content

Showing 1–13 of 13 results for author: Zhai, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2506.21757  [pdf, ps, other

    stat.ML cs.LG

    TADA: Improved Diffusion Sampling with Training-free Augmented Dynamics

    Authors: Tianrong Chen, Huangjie Zheng, David Berthelot, Jiatao Gu, Josh Susskind, Shuangfei Zhai

    Abstract: Diffusion models have demonstrated exceptional capabilities in generating high-fidelity images but typically suffer from inefficient sampling. Many solver designs and noise scheduling strategies have been proposed to dramatically improve sampling speeds. In this paper, we introduce a new sampling method that is up to $186\%$ faster than the current state of the art solver for comparative FID on Im… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  2. arXiv:2502.12786  [pdf, other

    stat.ML cs.LG

    Composition and Control with Distilled Energy Diffusion Models and Sequential Monte Carlo

    Authors: James Thornton, Louis Bethune, Ruixiang Zhang, Arwen Bradley, Preetum Nakkiran, Shuangfei Zhai

    Abstract: Diffusion models may be formulated as a time-indexed sequence of energy-based models, where the score corresponds to the negative gradient of an energy function. As opposed to learning the score directly, an energy parameterization is attractive as the energy itself can be used to control generation via Monte Carlo samplers. Architectural constraints and training instability in energy parameterize… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

    Comments: Initial submission to openreview on October 3, 2024 (https://openreview.net/forum?id=6GyX0YRw8P); accepted to AISTATS 2025

  3. arXiv:2410.08378  [pdf, other

    stat.CO stat.ME stat.ML

    Deep Generative Quantile Bayes

    Authors: Jungeum Kim, Percy S. Zhai, Veronika Ročková

    Abstract: We develop a multivariate posterior sampling procedure through deep generative quantile learning. Simulation proceeds implicitly through a push-forward mapping that can transform i.i.d. random vector samples from the posterior. We utilize Monge-Kantorovich depth in multivariate quantiles to directly sample from Bayesian credible sets, a unique feature not offered by typical posterior sampling meth… ▽ More

    Submitted 29 April, 2025; v1 submitted 10 October, 2024; originally announced October 2024.

  4. arXiv:2406.00633  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Improving GFlowNets for Text-to-Image Diffusion Alignment

    Authors: Dinghuai Zhang, Yizhe Zhang, Jiatao Gu, Ruixiang Zhang, Josh Susskind, Navdeep Jaitly, Shuangfei Zhai

    Abstract: Diffusion models have become the de-facto approach for generating visual data, which are trained to match the distribution of the training dataset. In addition, we also want to control generation to fulfill desired properties such as alignment to a text description, which can be specified with a black-box reward function. Prior works fine-tune pretrained diffusion models to achieve this goal throu… ▽ More

    Submitted 25 December, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  5. arXiv:2303.06296  [pdf, other

    cs.LG cs.AI cs.CL cs.CV stat.ML

    Stabilizing Transformer Training by Preventing Attention Entropy Collapse

    Authors: Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, Josh Susskind

    Abstract: Training stability is of great importance to Transformers. In this work, we investigate the training dynamics of Transformers by examining the evolution of the attention layers. In particular, we track the attention entropy for each attention head during the course of training, which is a proxy for model sharpness. We identify a common pattern across different architectures and tasks, where low at… ▽ More

    Submitted 25 July, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

    Journal ref: In International Conference on Machine Learning (pp. 40770-40803). PMLR. 2023

  6. arXiv:2105.02487  [pdf, other

    stat.ML cs.LG stat.ME

    High-dimensional Functional Graphical Model Structure Learning via Neighborhood Selection Approach

    Authors: Boxin Zhao, Percy S. Zhai, Y. Samuel Wang, Mladen Kolar

    Abstract: Undirected graphical models are widely used to model the conditional independence structure of vector-valued data. However, in many modern applications, for example those involving EEG and fMRI data, observations are more appropriately modeled as multivariate random functions rather than vectors. Functional graphical models have been proposed to model the conditional independence structure of such… ▽ More

    Submitted 25 January, 2024; v1 submitted 6 May, 2021; originally announced May 2021.

  7. arXiv:2006.10705  [pdf, other

    cs.LG cs.CV stat.ML

    Set Distribution Networks: a Generative Model for Sets of Images

    Authors: Shuangfei Zhai, Walter Talbott, Miguel Angel Bautista, Carlos Guestrin, Josh M. Susskind

    Abstract: Images with shared characteristics naturally form sets. For example, in a face verification benchmark, images of the same identity form sets. For generative models, the standard way of dealing with sets is to represent each as a one hot vector, and learn a conditional generative model $p(\mathbf{x}|\mathbf{y})$. This representation assumes that the number of sets is limited and known, such that th… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

  8. arXiv:2006.07678  [pdf, other

    cs.LG stat.ML

    Collegial Ensembles

    Authors: Etai Littwin, Ben Myara, Sima Sabah, Joshua Susskind, Shuangfei Zhai, Oren Golan

    Abstract: Modern neural network performance typically improves as model size increases. A recent line of research on the Neural Tangent Kernel (NTK) of over-parameterized networks indicates that the improvement with size increase is a product of a better conditioned loss landscape. In this work, we investigate a form of over-parameterization achieved through ensembling, where we define collegial ensembles (… ▽ More

    Submitted 17 June, 2020; v1 submitted 13 June, 2020; originally announced June 2020.

  9. arXiv:1910.13101  [pdf, other

    cs.LG stat.ML

    Adversarial Fisher Vectors for Unsupervised Representation Learning

    Authors: Shuangfei Zhai, Walter Talbott, Carlos Guestrin, Joshua M. Susskind

    Abstract: We examine Generative Adversarial Networks (GANs) through the lens of deep Energy Based Models (EBMs), with the goal of exploiting the density model that follows from this formulation. In contrast to a traditional view where the discriminator learns a constant function when reaching convergence, here we show that it can provide useful information for downstream tasks, e.g., feature extraction for… ▽ More

    Submitted 29 October, 2019; originally announced October 2019.

    Comments: Accepted as spotlight presentation to NeurIPS 2019

  10. arXiv:1905.05895  [pdf, other

    cs.LG cs.CV stat.ML

    Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment

    Authors: Chen Huang, Shuangfei Zhai, Walter Talbott, Miguel Angel Bautista, Shih-Yu Sun, Carlos Guestrin, Josh Susskind

    Abstract: In most machine learning training paradigms a fixed, often handcrafted, loss function is assumed to be a good proxy for an underlying evaluation metric. In this work we assess this assumption by meta-learning an adaptive loss function to directly optimize the evaluation metric. We propose a sample efficient reinforcement learning approach for adapting the loss dynamically during training. We empir… ▽ More

    Submitted 14 May, 2019; originally announced May 2019.

    Comments: Accepted to ICML 2019

  11. arXiv:1709.01648  [pdf, other

    cs.LG stat.ML

    Boosting Deep Learning Risk Prediction with Generative Adversarial Networks for Electronic Health Records

    Authors: Zhengping Che, Yu Cheng, Shuangfei Zhai, Zhaonan Sun, Yan Liu

    Abstract: The rapid growth of Electronic Health Records (EHRs), as well as the accompanied opportunities in Data-Driven Healthcare (DDH), has been attracting widespread interests and attentions. Recent progress in the design and applications of deep learning methods has shown promising results and is forcing massive changes in healthcare academia and industry, but most of these methods rely on massive label… ▽ More

    Submitted 5 September, 2017; originally announced September 2017.

    Comments: To appear in ICDM 2017. This is the full version of paper with 8 pages

  12. arXiv:1611.08737  [pdf, other

    cs.LG cs.CL stat.ML

    Structural Correspondence Learning for Cross-lingual Sentiment Classification with One-to-many Mappings

    Authors: Nana Li, Shuangfei Zhai, Zhongfei Zhang, Boying Liu

    Abstract: Structural correspondence learning (SCL) is an effective method for cross-lingual sentiment classification. This approach uses unlabeled documents along with a word translation oracle to automatically induce task specific, cross-lingual correspondences. It transfers knowledge through identifying important features, i.e., pivot features. For simplicity, however, it assumes that the word translation… ▽ More

    Submitted 26 November, 2016; originally announced November 2016.

    Comments: To appear in AAAI 2017. arXiv admin note: text overlap with arXiv:1008.0716 by other authors

  13. arXiv:1605.07717  [pdf, other

    cs.LG stat.ML

    Deep Structured Energy Based Models for Anomaly Detection

    Authors: Shuangfei Zhai, Yu Cheng, Weining Lu, Zhongfei Zhang

    Abstract: In this paper, we attack the anomaly detection problem by directly modeling the data distribution with deep architectures. We propose deep structured energy based models (DSEBMs), where the energy function is the output of a deterministic deep neural network with structure. We develop novel model architectures to integrate EBMs with different types of data such as static data, sequential data, and… ▽ More

    Submitted 15 June, 2016; v1 submitted 24 May, 2016; originally announced May 2016.

    Comments: To appear in ICML 2016