Search | arXiv e-print repository

Sampling Binary Data by Denoising through Score Functions

Abstract: Gaussian smoothing combined with a probabilistic framework for denoising via the empirical Bayes formalism, i.e., the Tweedie-Miyasawa formula (TMF), are the two key ingredients in the success of score-based generative models in Euclidean spaces. Smoothing holds the key for easing the problem of learning and sampling in high dimensions, denoising is needed for recovering the original signal, and T… ▽ More Gaussian smoothing combined with a probabilistic framework for denoising via the empirical Bayes formalism, i.e., the Tweedie-Miyasawa formula (TMF), are the two key ingredients in the success of score-based generative models in Euclidean spaces. Smoothing holds the key for easing the problem of learning and sampling in high dimensions, denoising is needed for recovering the original signal, and TMF ties these together via the score function of noisy data. In this work, we extend this paradigm to the problem of learning and sampling the distribution of binary data on the Boolean hypercube by adopting Bernoulli noise, instead of Gaussian noise, as a smoothing device. We first derive a TMF-like expression for the optimal denoiser for the Hamming loss, where a score function naturally appears. Sampling noisy binary data is then achieved using a Langevin-like sampler which we theoretically analyze for different noise levels. At high Bernoulli noise levels sampling becomes easy, akin to log-concave sampling in Euclidean spaces. In addition, we extend the sequential multi-measurement sampling of Saremi et al. (2024) to the binary setting where we can bring the "effective noise" down by sampling multiple noisy measurements at a fixed noise level, without the need for continuous-time stochastic processes. We validate our formalism and theoretical findings by experiments on synthetic data and binarized images. △ Less

Submitted 1 February, 2025; originally announced February 2025.

arXiv:2501.08508 [pdf, other]

Score-based 3D molecule generation with neural fields

Authors: Matthieu Kirchmeyer, Pedro O. Pinheiro, Saeed Saremi

Abstract: We introduce a new representation for 3D molecules based on their continuous atomic density fields. Using this representation, we propose a new model based on walk-jump sampling for unconditional 3D molecule generation in the continuous space using neural fields. Our model, FuncMol, encodes molecular fields into latent codes using a conditional neural field, samples noisy codes from a Gaussian-smo… ▽ More We introduce a new representation for 3D molecules based on their continuous atomic density fields. Using this representation, we propose a new model based on walk-jump sampling for unconditional 3D molecule generation in the continuous space using neural fields. Our model, FuncMol, encodes molecular fields into latent codes using a conditional neural field, samples noisy codes from a Gaussian-smoothed distribution with Langevin MCMC (walk), denoises these samples in a single step (jump), and finally decodes them into molecular fields. FuncMol performs all-atom generation of 3D molecules without assumptions on the molecular structure and scales well with the size of molecules, unlike most approaches. Our method achieves competitive results on drug-like molecules and easily scales to macro-cyclic peptides, with at least one order of magnitude faster sampling. The code is available at https://github.com/prescient-design/funcmol. △ Less

Submitted 14 January, 2025; originally announced January 2025.

Comments: NeurIPS 2024

arXiv:2410.14621 [pdf, other]

JAMUN: Transferable Molecular Conformational Ensemble Generation with Walk-Jump Sampling

Authors: Ameya Daigavane, Bodhi P. Vani, Saeed Saremi, Joseph Kleinhenz, Joshua Rackers

Abstract: Conformational ensembles of protein structures are immensely important both to understanding protein function, and for drug discovery in novel modalities such as cryptic pockets. Current techniques for sampling ensembles are computationally inefficient, or do not transfer to systems outside their training data. We present walk-Jump Accelerated Molecular ensembles with Universal Noise (JAMUN), a st… ▽ More Conformational ensembles of protein structures are immensely important both to understanding protein function, and for drug discovery in novel modalities such as cryptic pockets. Current techniques for sampling ensembles are computationally inefficient, or do not transfer to systems outside their training data. We present walk-Jump Accelerated Molecular ensembles with Universal Noise (JAMUN), a step towards the goal of efficiently sampling the Boltzmann distribution of arbitrary proteins. By extending Walk-Jump Sampling to point clouds, JAMUN enables ensemble generation at orders of magnitude faster rates than traditional molecular dynamics or state-of-the-art ML methods. Further, JAMUN is able to predict the stable basins of small peptides that were not seen during training. △ Less

Submitted 18 October, 2024; originally announced October 2024.

arXiv:2407.03428 [pdf, other]

NEBULA: Neural Empirical Bayes Under Latent Representations for Efficient and Controllable Design of Molecular Libraries

Authors: Ewa M. Nowara, Pedro O. Pinheiro, Sai Pooja Mahajan, Omar Mahmood, Andrew Martin Watkins, Saeed Saremi, Michael Maser

Abstract: We present NEBULA, the first latent 3D generative model for scalable generation of large molecular libraries around a seed compound of interest. Such libraries are crucial for scientific discovery, but it remains challenging to generate large numbers of high quality samples efficiently. 3D-voxel-based methods have recently shown great promise for generating high quality samples de novo from random… ▽ More We present NEBULA, the first latent 3D generative model for scalable generation of large molecular libraries around a seed compound of interest. Such libraries are crucial for scientific discovery, but it remains challenging to generate large numbers of high quality samples efficiently. 3D-voxel-based methods have recently shown great promise for generating high quality samples de novo from random noise (Pinheiro et al., 2023). However, sampling in 3D-voxel space is computationally expensive and use in library generation is prohibitively slow. Here, we instead perform neural empirical Bayes sampling (Saremi & Hyvarinen, 2019) in the learned latent space of a vector-quantized variational autoencoder. NEBULA generates large molecular libraries nearly an order of magnitude faster than existing methods without sacrificing sample quality. Moreover, NEBULA generalizes better to unseen drug-like molecules, as demonstrated on two public datasets and multiple recently released drugs. We expect the approach herein to be highly enabling for machine learning-based drug discovery. The code is available at https://github.com/prescient-design/nebula △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2405.03961 [pdf, other]

Structure-based drug design by denoising voxel grids

Authors: Pedro O. Pinheiro, Arian Jamasb, Omar Mahmood, Vishnu Sresht, Saeed Saremi

Abstract: We present VoxBind, a new score-based generative model for 3D molecules conditioned on protein structures. Our approach represents molecules as 3D atomic density grids and leverages a 3D voxel-denoising network for learning and generation. We extend the neural empirical Bayes formalism (Saremi & Hyvarinen, 2019) to the conditional setting and generate structure-conditioned molecules with a two-ste… ▽ More We present VoxBind, a new score-based generative model for 3D molecules conditioned on protein structures. Our approach represents molecules as 3D atomic density grids and leverages a 3D voxel-denoising network for learning and generation. We extend the neural empirical Bayes formalism (Saremi & Hyvarinen, 2019) to the conditional setting and generate structure-conditioned molecules with a two-step procedure: (i) sample noisy molecules from the Gaussian-smoothed conditional distribution with underdamped Langevin MCMC using the learned score function and (ii) estimate clean molecules from the noisy samples with single-step denoising. Compared to the current state of the art, our model is simpler to train, significantly faster to sample from, and achieves better results on extensive in silico benchmarks -- the generated molecules are more diverse, exhibit fewer steric clashes, and bind with higher affinity to protein pockets. The code is available at https://github.com/genentech/voxbind/. △ Less

Submitted 2 July, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

arXiv:2306.12360 [pdf, other]

Protein Discovery with Discrete Walk-Jump Sampling

Authors: Nathan C. Frey, Daniel Berenberg, Karina Zadorozhny, Joseph Kleinhenz, Julien Lafrance-Vanasse, Isidro Hotzel, Yan Wu, Stephen Ra, Richard Bonneau, Kyunghyun Cho, Andreas Loukas, Vladimir Gligorijevic, Saeed Saremi

Abstract: We resolve difficulties in training and sampling from a discrete generative model by learning a smoothed energy function, sampling from the smoothed data manifold with Langevin Markov chain Monte Carlo (MCMC), and projecting back to the true data manifold with one-step denoising. Our Discrete Walk-Jump Sampling formalism combines the contrastive divergence training of an energy-based model and imp… ▽ More We resolve difficulties in training and sampling from a discrete generative model by learning a smoothed energy function, sampling from the smoothed data manifold with Langevin Markov chain Monte Carlo (MCMC), and projecting back to the true data manifold with one-step denoising. Our Discrete Walk-Jump Sampling formalism combines the contrastive divergence training of an energy-based model and improved sample quality of a score-based model, while simplifying training and sampling by requiring only a single noise level. We evaluate the robustness of our approach on generative modeling of antibody proteins and introduce the distributional conformity score to benchmark protein generative models. By optimizing and sampling from our models for the proposed distributional conformity score, 97-100% of generated samples are successfully expressed and purified and 70% of functional designs show equal or improved binding affinity compared to known functional antibodies on the first attempt in a single round of laboratory experiments. We also report the first demonstration of long-run fast-mixing MCMC chains where diverse antibody protein classes are visited in a single MCMC chain. △ Less

Submitted 15 March, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: ICLR 2024 oral presentation, top 1.2% of submissions; {ICLR 2023 Physics for Machine Learning, NeurIPS 2023 GenBio, MLCB 2023} Spotlight

Journal ref: The Twelfth International Conference on Learning Representations, 2024

arXiv:2306.07473 [pdf, other]

3D molecule generation by denoising voxel grids

Authors: Pedro O. Pinheiro, Joshua Rackers, Joseph Kleinhenz, Michael Maser, Omar Mahmood, Andrew Martin Watkins, Stephen Ra, Vishnu Sresht, Saeed Saremi

Abstract: We propose a new score-based approach to generate 3D molecules represented as atomic densities on regular grids. First, we train a denoising neural network that learns to map from a smooth distribution of noisy molecules to the distribution of real molecules. Then, we follow the neural empirical Bayes framework (Saremi and Hyvarinen, 19) and generate molecules in two steps: (i) sample noisy densit… ▽ More We propose a new score-based approach to generate 3D molecules represented as atomic densities on regular grids. First, we train a denoising neural network that learns to map from a smooth distribution of noisy molecules to the distribution of real molecules. Then, we follow the neural empirical Bayes framework (Saremi and Hyvarinen, 19) and generate molecules in two steps: (i) sample noisy density grids from a smooth distribution via underdamped Langevin Markov chain Monte Carlo, and (ii) recover the "clean" molecule by denoising the noisy grid with a single step. Our method, VoxMol, generates molecules in a fundamentally different way than the current state of the art (ie, diffusion models applied to atom point clouds). It differs in terms of the data representation, the noise model, the network architecture and the generative modeling algorithm. Our experiments show that VoxMol captures the distribution of drug-like molecules better than state of the art, while being faster to generate samples. △ Less

Submitted 8 March, 2024; v1 submitted 12 June, 2023; originally announced June 2023.

arXiv:2305.19473 [pdf, other]

Chain of Log-Concave Markov Chains

Authors: Saeed Saremi, Ji Won Park, Francis Bach

Abstract: We introduce a theoretical framework for sampling from unnormalized densities based on a smoothing scheme that uses an isotropic Gaussian kernel with a single fixed noise scale. We prove one can decompose sampling from a density (minimal assumptions made on the density) into a sequence of sampling from log-concave conditional densities via accumulation of noisy measurements with equal noise levels… ▽ More We introduce a theoretical framework for sampling from unnormalized densities based on a smoothing scheme that uses an isotropic Gaussian kernel with a single fixed noise scale. We prove one can decompose sampling from a density (minimal assumptions made on the density) into a sequence of sampling from log-concave conditional densities via accumulation of noisy measurements with equal noise levels. Our construction is unique in that it keeps track of a history of samples, making it non-Markovian as a whole, but it is lightweight algorithmically as the history only shows up in the form of a running empirical mean of samples. Our sampling algorithm generalizes walk-jump sampling (Saremi & Hyvärinen, 2019). The "walk" phase becomes a (non-Markovian) chain of (log-concave) Markov chains. The "jump" from the accumulated measurements is obtained by empirical Bayes. We study our sampling algorithm quantitatively using the 2-Wasserstein metric and compare it with various Langevin MCMC algorithms. We also report a remarkable capacity of our algorithm to "tunnel" between modes of a distribution. △ Less

Submitted 28 September, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

arXiv:2303.11669 [pdf, other]

Universal Smoothed Score Functions for Generative Modeling

Authors: Saeed Saremi, Rupesh Kumar Srivastava, Francis Bach

Abstract: We consider the problem of generative modeling based on smoothing an unknown density of interest in $\mathbb{R}^d$ using factorial kernels with $M$ independent Gaussian channels with equal noise levels introduced by Saremi and Srivastava (2022). First, we fully characterize the time complexity of learning the resulting smoothed density in $\mathbb{R}^{Md}$, called M-density, by deriving a universa… ▽ More We consider the problem of generative modeling based on smoothing an unknown density of interest in $\mathbb{R}^d$ using factorial kernels with $M$ independent Gaussian channels with equal noise levels introduced by Saremi and Srivastava (2022). First, we fully characterize the time complexity of learning the resulting smoothed density in $\mathbb{R}^{Md}$, called M-density, by deriving a universal form for its parametrization in which the score function is by construction permutation equivariant. Next, we study the time complexity of sampling an M-density by analyzing its condition number for Gaussian distributions. This spectral analysis gives a geometric insight on the "shape" of M-densities as one increases $M$. Finally, we present results on the sample quality in this class of generative models on the CIFAR-10 dataset where we report Fréchet inception distances (14.15), notably obtained with a single noise level on long-run fast-mixing MCMC chains. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: Technical Report

arXiv:2210.04096 [pdf, other]

PropertyDAG: Multi-objective Bayesian optimization of partially ordered, mixed-variable properties for biological sequence design

Authors: Ji Won Park, Samuel Stanton, Saeed Saremi, Andrew Watkins, Henri Dwyer, Vladimir Gligorijevic, Richard Bonneau, Stephen Ra, Kyunghyun Cho

Abstract: Bayesian optimization offers a sample-efficient framework for navigating the exploration-exploitation trade-off in the vast design space of biological sequences. Whereas it is possible to optimize the various properties of interest jointly using a multi-objective acquisition function, such as the expected hypervolume improvement (EHVI), this approach does not account for objectives with a hierarch… ▽ More Bayesian optimization offers a sample-efficient framework for navigating the exploration-exploitation trade-off in the vast design space of biological sequences. Whereas it is possible to optimize the various properties of interest jointly using a multi-objective acquisition function, such as the expected hypervolume improvement (EHVI), this approach does not account for objectives with a hierarchical dependency structure. We consider a common use case where some regions of the Pareto frontier are prioritized over others according to a specified $\textit{partial ordering}$ in the objectives. For instance, when designing antibodies, we would like to maximize the binding affinity to a target antigen only if it can be expressed in live cell culture -- modeling the experimental dependency in which affinity can only be measured for antibodies that can be expressed and thus produced in viable quantities. In general, we may want to confer a partial ordering to the properties such that each property is optimized conditioned on its parent properties satisfying some feasibility condition. To this end, we present PropertyDAG, a framework that operates on top of the traditional multi-objective BO to impose this desired ordering on the objectives, e.g. expression $\rightarrow$ affinity. We demonstrate its performance over multiple simulated active learning iterations on a penicillin production task, toy numerical problem, and a real-world antibody design task. △ Less

Submitted 8 October, 2022; originally announced October 2022.

Comments: 9 pages, 7 figures. Submitted to NeurIPS 2022 AI4Science Workshop

arXiv:2112.09822 [pdf, other]

Multimeasurement Generative Models

Authors: Saeed Saremi, Rupesh Kumar Srivastava

Abstract: We formally map the problem of sampling from an unknown distribution with a density in $\mathbb{R}^d$ to the problem of learning and sampling a smoother density in $\mathbb{R}^{Md}$ obtained by convolution with a fixed factorial kernel: the new density is referred to as M-density and the kernel as multimeasurement noise model (MNM). The M-density in $\mathbb{R}^{Md}$ is smoother than the original… ▽ More We formally map the problem of sampling from an unknown distribution with a density in $\mathbb{R}^d$ to the problem of learning and sampling a smoother density in $\mathbb{R}^{Md}$ obtained by convolution with a fixed factorial kernel: the new density is referred to as M-density and the kernel as multimeasurement noise model (MNM). The M-density in $\mathbb{R}^{Md}$ is smoother than the original density in $\mathbb{R}^d$, easier to learn and sample from, yet for large $M$ the two problems are mathematically equivalent since clean data can be estimated exactly given a multimeasurement noisy observation using the Bayes estimator. To formulate the problem, we derive the Bayes estimator for Poisson and Gaussian MNMs in closed form in terms of the unnormalized M-density. This leads to a simple least-squares objective for learning parametric energy and score functions. We present various parametrization schemes of interest including one in which studying Gaussian M-densities directly leads to multidenoising autoencoders--this is the first theoretical connection made between denoising autoencoders and empirical Bayes in the literature. Samples in $\mathbb{R}^d$ are obtained by walk-jump sampling (Saremi & Hyvarinen, 2019) via underdamped Langevin MCMC (walk) to sample from M-density and the multimeasurement Bayes estimation (jump). We study permutation invariant Gaussian M-densities on MNIST, CIFAR-10, and FFHQ-256 datasets, and demonstrate the effectiveness of this framework for realizing fast-mixing stable Markov chains in high dimensions. △ Less

Submitted 16 June, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

Comments: Our code is publicly available at https://github.com/nnaisense/mems

Journal ref: International Conference on Learning Representations, 2022

arXiv:2101.11890 [pdf, other]

Automatic design of novel potential 3CL$^{\text{pro}}$ and PL$^{\text{pro}}$ inhibitors

Authors: Timothy Atkinson, Saeed Saremi, Faustino Gomez, Jonathan Masci

Abstract: With the goal of designing novel inhibitors for SARS-CoV-1 and SARS-CoV-2, we propose the general molecule optimization framework, Molecular Neural Assay Search (MONAS), consisting of three components: a property predictor which identifies molecules with specific desirable properties, an energy model which approximates the statistical similarity of a given molecule to known training molecules, and… ▽ More With the goal of designing novel inhibitors for SARS-CoV-1 and SARS-CoV-2, we propose the general molecule optimization framework, Molecular Neural Assay Search (MONAS), consisting of three components: a property predictor which identifies molecules with specific desirable properties, an energy model which approximates the statistical similarity of a given molecule to known training molecules, and a molecule search method. In this work, these components are instantiated with graph neural networks (GNNs), Deep Energy Estimator Networks (DEEN) and Monte Carlo tree search (MCTS), respectively. This implementation is used to identify 120K molecules (out of 40-million explored) which the GNN determined to be likely SARS-CoV-1 inhibitors, and, at the same time, are statistically close to the dataset used to train the GNN. △ Less

Submitted 29 January, 2021; v1 submitted 28 January, 2021; originally announced January 2021.

arXiv:2007.15130 [pdf, other]

Unnormalized Variational Bayes

Authors: Saeed Saremi

Abstract: We unify empirical Bayes and variational Bayes for approximating unnormalized densities. This framework, named unnormalized variational Bayes (UVB), is based on formulating a latent variable model for the random variable $Y=X+N(0,σ^2 I_d)$ and using the evidence lower bound (ELBO), computed by a variational autoencoder, as a parametrization of the energy function of $Y$ which is then used to estim… ▽ More We unify empirical Bayes and variational Bayes for approximating unnormalized densities. This framework, named unnormalized variational Bayes (UVB), is based on formulating a latent variable model for the random variable $Y=X+N(0,σ^2 I_d)$ and using the evidence lower bound (ELBO), computed by a variational autoencoder, as a parametrization of the energy function of $Y$ which is then used to estimate $X$ with the empirical Bayes least-squares estimator. In this intriguing setup, the $\textit{gradient}$ of the ELBO with respect to noisy inputs plays the central role in learning the energy function. Empirically, we demonstrate that UVB has a higher capacity to approximate energy functions than the parametrization with MLPs as done in neural empirical Bayes (DEEN). We especially showcase $σ=1$, where the differences between UVB and DEEN become visible and qualitative in the denoising experiments. For this high level of noise, the distribution of $Y$ is very smoothed and we demonstrate that one can traverse in a single run $-$ without a restart $-$ all MNIST classes in a variety of styles via walk-jump sampling with a fast-mixing Langevin MCMC sampler. We finish by probing the encoder/decoder of the trained models and confirm UVB $\neq$ VAE. △ Less

Submitted 29 July, 2020; originally announced July 2020.

Comments: Submitted to Journal of Machine Learning Research

arXiv:2005.09047 [pdf, other]

Learning and Inference in Imaginary Noise Models

Authors: Saeed Saremi

Abstract: Inspired by recent developments in learning smoothed densities with empirical Bayes, we study variational autoencoders with a decoder that is tailored for the random variable $Y=X+N(0,σ^2 I_d)$. A notion of smoothed variational inference emerges where the smoothing is implicitly enforced by the noise model of the decoder; "implicit", since during training the encoder only sees clean samples. This… ▽ More Inspired by recent developments in learning smoothed densities with empirical Bayes, we study variational autoencoders with a decoder that is tailored for the random variable $Y=X+N(0,σ^2 I_d)$. A notion of smoothed variational inference emerges where the smoothing is implicitly enforced by the noise model of the decoder; "implicit", since during training the encoder only sees clean samples. This is the concept of imaginary noise model, where the noise model dictates the functional form of the variational lower bound $\mathcal{L}(σ)$, but the noisy data are never seen during learning. The model is named $σ$-VAE. We prove that all $σ$-VAEs are equivalent to each other via a simple $β$-VAE expansion: $\mathcal{L}(σ_2) \equiv \mathcal{L}(σ_1,β)$, where $β=σ_2^2/σ_1^2$. We prove a similar result for the Laplace distribution in exponential families. Empirically, we report an intriguing power law $\mathcal{D}_{\rm KL} \sim σ^{-ν}$ for the learned models and we study the inference in the $σ$-VAE for unseen noisy data. The experiments were performed on MNIST, where we show that quite remarkably the model can make reasonable inferences on extremely noisy samples even though it has not seen any during training. The vanilla VAE completely breaks down in this regime. We finish with a hypothesis (the XYZ hypothesis) on the findings here. △ Less

Submitted 5 June, 2020; v1 submitted 18 May, 2020; originally announced May 2020.

arXiv:2005.04504 [pdf, other]

Provable Robust Classification via Learned Smoothed Densities

Authors: Saeed Saremi, Rupesh Srivastava

Abstract: Smoothing classifiers and probability density functions with Gaussian kernels appear unrelated, but in this work, they are unified for the problem of robust classification. The key building block is approximating the $\textit{energy function}$ of the random variable $Y=X+N(0,σ^2 I_d)$ with a neural network which we use to formulate the problem of robust classification in terms of $\widehat{x}(Y)$,… ▽ More Smoothing classifiers and probability density functions with Gaussian kernels appear unrelated, but in this work, they are unified for the problem of robust classification. The key building block is approximating the $\textit{energy function}$ of the random variable $Y=X+N(0,σ^2 I_d)$ with a neural network which we use to formulate the problem of robust classification in terms of $\widehat{x}(Y)$, the $\textit{Bayes estimator}$ of $X$ given the noisy measurements $Y$. We introduce $\textit{empirical Bayes smoothed classifiers}$ within the framework of $\textit{randomized smoothing}$ and study it theoretically for the two-class linear classifier, where we show one can improve their robustness above $\textit{the margin}$. We test the theory on MNIST and we show that with a learned smoothed energy function and a linear classifier we can achieve provable $\ell_2$ robust accuracies that are competitive with empirical defenses. This setup can be significantly improved by $\textit{learning}$ empirical Bayes smoothed classifiers with adversarial training and on MNIST we show that we can achieve provable robust accuracies higher than the state-of-the-art empirical defenses in a range of radii. We discuss some fundamental challenges of randomized smoothing based on a geometric interpretation due to concentration of Gaussians in high dimensions, and we finish the paper with a proposal for using walk-jump sampling, itself based on learned smoothed densities, for robust classification. △ Less

Submitted 9 May, 2020; originally announced May 2020.

Comments: 24 pages, 6 figures

arXiv:1912.03845 [pdf, other]

No Representation without Transformation

Authors: Giorgio Giannone, Saeed Saremi, Jonathan Masci, Christian Osendorfer

Abstract: We extend the framework of variational autoencoders to represent transformations explicitly in the latent space. In the family of hierarchical graphical models that emerges, the latent space is populated by higher order objects that are inferred jointly with the latent representations they act on. To explicitly demonstrate the effect of these higher order objects, we show that the inferred latent… ▽ More We extend the framework of variational autoencoders to represent transformations explicitly in the latent space. In the family of hierarchical graphical models that emerges, the latent space is populated by higher order objects that are inferred jointly with the latent representations they act on. To explicitly demonstrate the effect of these higher order objects, we show that the inferred latent transformations reflect interpretable properties in the observation space. Furthermore, the model is structured in such a way that in the absence of transformations, we can run inference and obtain generative capabilities comparable with standard variational autoencoders. Finally, utilizing the trained encoder, we outperform the baselines by a wide margin on a challenging out-of-distribution classification task. △ Less

Submitted 23 April, 2020; v1 submitted 8 December, 2019; originally announced December 2019.

Comments: Preprint. Accepted at BDL and PGR workshops at NeurIPS 2019

arXiv:1912.03257 [pdf]

Piezoresponse phase as variable in electromechanical characterization

Authors: Sabine M. Neumayer, Sahar Saremi, Lane W. Martin, Liam Collins, Alexander Tselev, Stephen Jesse, Sergei V. Kalinin, Nina Balke

Abstract: Piezoresponse force microscopy (PFM) is a powerful characterization technique to readily image and manipulate ferroelectrics domains. PFM gives insight into the strength of local piezoelectric coupling as well as polarization direction through PFM amplitude and phase, respectively. Converting measured arbitrary units to physical material parameters, however, remains a challenge. While much effort… ▽ More Piezoresponse force microscopy (PFM) is a powerful characterization technique to readily image and manipulate ferroelectrics domains. PFM gives insight into the strength of local piezoelectric coupling as well as polarization direction through PFM amplitude and phase, respectively. Converting measured arbitrary units to physical material parameters, however, remains a challenge. While much effort has been spent on quantifying the PFM amplitude signal, little attention has been given to the PFM phase and it is often arbitrarily adjusted to fit expectations or processed as recorded. This is problematic when investigating materials with unknown or potentially negative sign of the probed effective electrostrictive coefficient or strong frequency dispersion of electromechanical responses since assumptions about the phase cannot be reliably made. The PFM phase can, however, provide important information on the polarization orientation and the sign of the electrostrictive coefficient. Most notably, the orientation of the PFM hysteresis loop is determined by the PFM phase. Moreover, when presenting PFM data as a combined signal, the resulting response can be artificially lowered or asymmetric if the phase data has not been correctly processed. Here, we demonstrate a path to identify the phase offset required to extract correct meaning from PFM phase data. We explore different sources of phase offsets including the experimental setup, instrumental contributions, and data analysis. We discuss the physical working principles of PFM and develop a strategy to extract physical meaning from the PFM phase. The proposed procedures are verified on two materials with positive and negative piezoelectric coefficients. △ Less

Submitted 6 December, 2019; originally announced December 2019.

Comments: 16 pages, 6 figures

arXiv:1910.12744 [pdf, ps, other]

On approximating $\nabla f$ with neural networks

Authors: Saeed Saremi

Abstract: Consider a feedforward neural network $ψ: \mathbb{R}^d\rightarrow \mathbb{R}^d$ such that $ψ\approx \nabla f$, where $f:\mathbb{R}^d \rightarrow \mathbb{R}$ is a smooth function, therefore $ψ$ must satisfy $\partial_j ψ_i = \partial_i ψ_j$ pointwise. We prove a theorem that a $ψ$ network with more than one hidden layer can only represent one feature in its first hidden layer; this is a dramatic de… ▽ More Consider a feedforward neural network $ψ: \mathbb{R}^d\rightarrow \mathbb{R}^d$ such that $ψ\approx \nabla f$, where $f:\mathbb{R}^d \rightarrow \mathbb{R}$ is a smooth function, therefore $ψ$ must satisfy $\partial_j ψ_i = \partial_i ψ_j$ pointwise. We prove a theorem that a $ψ$ network with more than one hidden layer can only represent one feature in its first hidden layer; this is a dramatic departure from the well-known results for one hidden layer. The proof of the theorem is straightforward, where two backward paths and a weight-tying matrix play the key roles. We then present the alternative, the implicit parametrization, where the neural network is $φ: \mathbb{R}^d \rightarrow \mathbb{R}$ and $\nabla φ\approx \nabla f$; in addition, a "soft analysis" of $\nabla φ$ gives a dual perspective on the theorem. Throughout, we come back to recent probabilistic models that are formulated as $\nabla φ\approx \nabla f$, and conclude with a critique of denoising autoencoders. △ Less

Submitted 6 November, 2019; v1 submitted 28 October, 2019; originally announced October 2019.

Comments: 10 pages

arXiv:1903.02334 [pdf, other]

Neural Empirical Bayes

Authors: Saeed Saremi, Aapo Hyvarinen

Abstract: We unify $\textit{kernel density estimation}$ and $\textit{empirical Bayes}$ and address a set of problems in unsupervised learning with a geometric interpretation of those methods, rooted in the $\textit{concentration of measure}$ phenomenon. Kernel density is viewed symbolically as $X\rightharpoonup Y$ where the random variable $X$ is smoothed to $Y= X+N(0,σ^2 I_d)$, and empirical Bayes is the m… ▽ More We unify $\textit{kernel density estimation}$ and $\textit{empirical Bayes}$ and address a set of problems in unsupervised learning with a geometric interpretation of those methods, rooted in the $\textit{concentration of measure}$ phenomenon. Kernel density is viewed symbolically as $X\rightharpoonup Y$ where the random variable $X$ is smoothed to $Y= X+N(0,σ^2 I_d)$, and empirical Bayes is the machinery to denoise in a least-squares sense, which we express as $X \leftharpoondown Y$. A learning objective is derived by combining these two, symbolically captured by $X \rightleftharpoons Y$. Crucially, instead of using the original nonparametric estimators, we parametrize $\textit{the energy function}$ with a neural network denoted by $φ$; at optimality, $\nabla φ\approx -\nabla \log f$ where $f$ is the density of $Y$. The optimization problem is abstracted as interactions of high-dimensional spheres which emerge due to the concentration of isotropic gaussians. We introduce two algorithmic frameworks based on this machinery: (i) a "walk-jump" sampling scheme that combines Langevin MCMC (walks) and empirical Bayes (jumps), and (ii) a probabilistic framework for $\textit{associative memory}$, called NEBULA, defined à la Hopfield by the $\textit{gradient flow}$ of the learned energy to a set of attractors. We finish the paper by reporting the emergence of very rich "creative memories" as attractors of NEBULA for highly-overlapping spheres. △ Less

Submitted 21 April, 2020; v1 submitted 6 March, 2019; originally announced March 2019.

Comments: 23 pages, 10 figures

Journal ref: Journal of Machine Learning Research 20(181), 1-23, 2019

arXiv:1805.08306 [pdf, other]

Deep Energy Estimator Networks

Authors: Saeed Saremi, Arash Mehrjou, Bernhard Schölkopf, Aapo Hyvärinen

Abstract: Density estimation is a fundamental problem in statistical learning. This problem is especially challenging for complex high-dimensional data due to the curse of dimensionality. A promising solution to this problem is given here in an inference-free hierarchical framework that is built on score matching. We revisit the Bayesian interpretation of the score function and the Parzen score matching, an… ▽ More Density estimation is a fundamental problem in statistical learning. This problem is especially challenging for complex high-dimensional data due to the curse of dimensionality. A promising solution to this problem is given here in an inference-free hierarchical framework that is built on score matching. We revisit the Bayesian interpretation of the score function and the Parzen score matching, and construct a multilayer perceptron with a scalable objective for learning the energy (i.e. the unnormalized log-density), which is then optimized with stochastic gradient descent. In addition, the resulting deep energy estimator network (DEEN) is designed as products of experts. We present the utility of DEEN in learning the energy, the score function, and in single-step denoising experiments for synthetic and high-dimensional data. We also diagnose stability problems in the direct estimation of the score function that had been observed for denoising autoencoders. △ Less

Submitted 21 May, 2018; originally announced May 2018.

arXiv:1705.07505 [pdf, other]

Annealed Generative Adversarial Networks

Authors: Arash Mehrjou, Bernhard Schölkopf, Saeed Saremi

Abstract: We introduce a novel framework for adversarial training where the target distribution is annealed between the uniform distribution and the data distribution. We posited a conjecture that learning under continuous annealing in the nonparametric regime is stable irrespective of the divergence measures in the objective function and proposed an algorithm, dubbed ß-GAN, in corollary. In this framework,… ▽ More We introduce a novel framework for adversarial training where the target distribution is annealed between the uniform distribution and the data distribution. We posited a conjecture that learning under continuous annealing in the nonparametric regime is stable irrespective of the divergence measures in the objective function and proposed an algorithm, dubbed ß-GAN, in corollary. In this framework, the fact that the initial support of the generative network is the whole ambient space combined with annealing are key to balancing the minimax game. In our experiments on synthetic data, MNIST, and CelebA, ß-GAN with a fixed annealing schedule was stable and did not suffer from mode collapse. △ Less

Submitted 21 May, 2017; originally announced May 2017.

Comments: 9 pages, 6 figures

arXiv:1610.00385 [pdf]

doi 10.1021/acs.nanolett.6b03785

Pressurizing Field-Effect Transistors of Few-Layer MoS2 in a Diamond Anvil Cell

Authors: Yabin Chen, Feng Ke, Penghong Ci, Changhyun Ko, Taegyun Park, Sahar Saremi, Huili Liu, Yeonbae Lee, Joonki Suh, Lane W. Martin, Joel W. Ager, Bin Chen, Junqiao Wu

Abstract: Hydrostatic pressure applied using diamond anvil cells (DAC) has been widely explored to modulate physical properties of materials by tuning their lattice degree of freedom. Independently, electrical field is able to tune the electronic degree of freedom of functional materials via, for example, the field-effect transistor (FET) configuration. Combining these two orthogonal approaches would allow… ▽ More Hydrostatic pressure applied using diamond anvil cells (DAC) has been widely explored to modulate physical properties of materials by tuning their lattice degree of freedom. Independently, electrical field is able to tune the electronic degree of freedom of functional materials via, for example, the field-effect transistor (FET) configuration. Combining these two orthogonal approaches would allow discovery of new physical properties and phases going beyond the known phase space. Such experiments are, however, technically challenging and have not been demonstrated. Herein, we report a feasible strategy to prepare and measure FETs in a DAC by lithographically patterning the nanodevices onto the diamond culet. Multiple-terminal FETs were fabricated in the DAC using few-layer MoS2 and BN as the channel semiconductor and dielectric layer, respectively. It is found that the mobility, conductance, carrier concentration, and contact conductance of MoS2 can all be significantly enhanced with pressure. We expect that the approach could enable unprecedented ways to explore new phases and properties of materials under coupled mechano-electrostatic modulation. △ Less

Submitted 2 October, 2016; originally announced October 2016.

Comments: 15 pages, 5 figures

arXiv:1510.07740 [pdf, other]

The Wilson Machine for Image Modeling

Authors: Saeed Saremi, Terrence J. Sejnowski

Abstract: Learning the distribution of natural images is one of the hardest and most important problems in machine learning. The problem remains open, because the enormous complexity of the structures in natural images spans all length scales. We break down the complexity of the problem and show that the hierarchy of structures in natural images fuels a new class of learning algorithms based on the theory o… ▽ More Learning the distribution of natural images is one of the hardest and most important problems in machine learning. The problem remains open, because the enormous complexity of the structures in natural images spans all length scales. We break down the complexity of the problem and show that the hierarchy of structures in natural images fuels a new class of learning algorithms based on the theory of critical phenomena and stochastic processes. We approach this problem from the perspective of the theory of critical phenomena, which was developed in condensed matter physics to address problems with infinite length-scale fluctuations, and build a framework to integrate the criticality of natural images into a learning algorithm. The problem is broken down by mapping images into a hierarchy of binary images, called bitplanes. In this representation, the top bitplane is critical, having fluctuations in structures over a vast range of scales. The bitplanes below go through a gradual stochastic heating process to disorder. We turn this representation into a directed probabilistic graphical model, transforming the learning problem into the unsupervised learning of the distribution of the critical bitplane and the supervised learning of the conditional distributions for the remaining bitplanes. We learnt the conditional distributions by logistic regression in a convolutional architecture. Conditioned on the critical binary image, this simple architecture can generate large, natural-looking images, with many shades of gray, without the use of hidden units, unprecedented in the studies of natural images. The framework presented here is a major step in bringing criticality and stochastic processes to machine learning and in studying natural image statistics. △ Less

Submitted 11 November, 2015; v1 submitted 26 October, 2015; originally announced October 2015.

arXiv:1406.6311 [pdf]

doi 10.1140/epjc/s10052-014-3026-9

The Physics of the B Factories

Authors: A. J. Bevan, B. Golob, Th. Mannel, S. Prell, B. D. Yabsley, K. Abe, H. Aihara, F. Anulli, N. Arnaud, T. Aushev, M. Beneke, J. Beringer, F. Bianchi, I. I. Bigi, M. Bona, N. Brambilla, J. B rodzicka, P. Chang, M. J. Charles, C. H. Cheng, H. -Y. Cheng, R. Chistov, P. Colangelo, J. P. Coleman, A. Drutskoy , et al. (2009 additional authors not shown)

Abstract: This work is on the Physics of the B Factories. Part A of this book contains a brief description of the SLAC and KEK B Factories as well as their detectors, BaBar and Belle, and data taking related issues. Part B discusses tools and methods used by the experiments in order to obtain results. The results themselves can be found in Part C. Please note that version 3 on the archive is the auxiliary… ▽ More This work is on the Physics of the B Factories. Part A of this book contains a brief description of the SLAC and KEK B Factories as well as their detectors, BaBar and Belle, and data taking related issues. Part B discusses tools and methods used by the experiments in order to obtain results. The results themselves can be found in Part C. Please note that version 3 on the archive is the auxiliary version of the Physics of the B Factories book. This uses the notation alpha, beta, gamma for the angles of the Unitarity Triangle. The nominal version uses the notation phi_1, phi_2 and phi_3. Please cite this work as Eur. Phys. J. C74 (2014) 3026. △ Less

Submitted 31 October, 2015; v1 submitted 24 June, 2014; originally announced June 2014.

Comments: 928 pages, version 3 (arXiv:1406.6311v3) corresponds to the alpha, beta, gamma version of the book, the other versions use the phi1, phi2, phi3 notation

Report number: SLAC-PUB-15968, KEK Preprint 2014-3

Journal ref: Eur. Phys. J. C74 (2014) 3026

arXiv:0903.4195 [pdf, ps, other]

Kondo Vortices, Zero Modes, and Magnetic Ordering in a Kondo Lattice Model

Authors: Saeed Saremi, Patrick A. Lee, T. Senthil

Abstract: Motivated by the mysteries of the heavy fermion quantum critical point, we investigate the competition between Kondo screening and magnetic ordering in the honeycomb Kondo lattice at half filling. We examine the destruction of the Kondo phase by proliferating vortex configurations in the Kondo hybridization order parameter. We find that there are zero modes associated with Kondo vortices. Conden… ▽ More Motivated by the mysteries of the heavy fermion quantum critical point, we investigate the competition between Kondo screening and magnetic ordering in the honeycomb Kondo lattice at half filling. We examine the destruction of the Kondo phase by proliferating vortex configurations in the Kondo hybridization order parameter. We find that there are zero modes associated with Kondo vortices. Condensing these vortices can lead to the antiferromagnetic phase. △ Less

Submitted 25 March, 2009; originally announced March 2009.

Comments: 4 pages, 2 figures, 4 tables

arXiv:0705.0187 [pdf, ps, other]

doi 10.1103/PhysRevB.76.184430

RKKY in half-filled bipartite lattices: graphene as an example

Authors: Saeed Saremi

Abstract: We first present a simple proof that for any bipartite lattice at half filling the RKKY interaction is antiferromagnetic between impurities on opposite (i.e., A and B) sublattices and is ferromagnetic between impurities on the same sublattices. This result is valid on all length scales. We then focus on the honeycomb lattice and examine the theorem in the long distance limit by performing the lo… ▽ More We first present a simple proof that for any bipartite lattice at half filling the RKKY interaction is antiferromagnetic between impurities on opposite (i.e., A and B) sublattices and is ferromagnetic between impurities on the same sublattices. This result is valid on all length scales. We then focus on the honeycomb lattice and examine the theorem in the long distance limit by performing the low energy calculation using Dirac electrons. To find the universal (cutoff free) result we perform the calculation in smooth cutoff schemes, as we show that the calculation based on a sharp cutoff leads to wrong results. We also find the long distance behavior of the RKKY interaction between "plaquette" impurities in both coherent and incoherent regimes. △ Less

Submitted 26 November, 2007; v1 submitted 2 May, 2007; originally announced May 2007.

Comments: v3. The published version. 6 pages, 1 figure

Journal ref: Phys. Rev. B 76, 184430 (2007) (6 pages)

arXiv:cond-mat/0610273 [pdf, ps, other]

doi 10.1103/PhysRevB.75.165110

Quantum critical point in the Kondo-Heisenberg model on the honeycomb lattice

Authors: Saeed Saremi, Patrick A. Lee

Abstract: We study the Kondo--Heisenberg model on the honeycomb lattice at half-filling. Due to the vanishing of the density of states at the fermi level, the Kondo insulator disappears at a finite Kondo coupling even in the absence of the Heisenberg exchange. We adopt a large-N formulation of this model and use the renormalization group machinery to study systematically the second order phase transition… ▽ More We study the Kondo--Heisenberg model on the honeycomb lattice at half-filling. Due to the vanishing of the density of states at the fermi level, the Kondo insulator disappears at a finite Kondo coupling even in the absence of the Heisenberg exchange. We adopt a large-N formulation of this model and use the renormalization group machinery to study systematically the second order phase transition of the Kondo insulator (KI) to the algebraic spin liquid (ASL). We note that neither phase breaks any physical symmetry, so that the transition is not described by the standard Ginzburg-Landau-Wilson critical point. We find a stable Lorentz-invariant fixed point that controls this second order phase transition. We calculate the exponent $ν$ of the diverging length scale near the transition. The quasi-particle weight of the conduction electron vanishes at this KI--ASL fixed point, indicating non-Fermi liquid behavior. The algebraic decay exponent of the staggered spin correlation is calculated at the fixed point and in the ASL phase. We find a jump in this exponent at the transition point. △ Less

Submitted 1 May, 2007; v1 submitted 10 October, 2006; originally announced October 2006.

Comments: The published version. New title. Very minor changes in the abstract and introduction compared to the original version

Journal ref: Phys. Rev. B 75, 165110 (2007)

Showing 1–27 of 27 results for author: Saremi, S