-
Expectation-maximization for multi-reference alignment: Two pitfalls and one remedy
Authors:
Amnon Balanov,
Wasim Huleihel,
Tamir Bendory
Abstract:
We study the multi-reference alignment model, which involves recovering a signal from noisy observations that have been randomly transformed by an unknown group action, a fundamental challenge in statistical signal processing, computational imaging, and structural biology. While much of the theoretical literature has focused on the asymptotic sample complexity of this model, the practical performa…
▽ More
We study the multi-reference alignment model, which involves recovering a signal from noisy observations that have been randomly transformed by an unknown group action, a fundamental challenge in statistical signal processing, computational imaging, and structural biology. While much of the theoretical literature has focused on the asymptotic sample complexity of this model, the practical performance of reconstruction algorithms, particularly of the omnipresent expectation maximization (EM) algorithm, remains poorly understood.
In this work, we present a detailed investigation of EM in the challenging low signal-to-noise ratio (SNR) regime. We identify and characterize two failure modes that emerge in this setting. The first, called Einstein from Noise, reveals a strong sensitivity to initialization, with reconstructions resembling the input template regardless of the true underlying signal. The second phenomenon, referred to as the Ghost of Newton, involves EM initially converging towards the correct solution but later diverging, leading to a loss of reconstruction fidelity. We provide theoretical insights and support our findings through numerical experiments. Finally, we introduce a simple, yet effective modification to EM based on mini-batching, which mitigates the above artifacts. Supported by both theory and experiments, this mini-batching approach processes small data subsets per iteration, reducing initialization bias and computational cost, while maintaining accuracy comparable to full-batch EM.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
The stability of generalized phase retrieval problem over compact groups
Authors:
Tal Amir,
Tamir Bendory,
Nadav Dym,
Dan Edidin
Abstract:
The generalized phase retrieval problem over compact groups aims to recover a set of matrices, representing an unknown signal, from their associated Gram matrices, leveraging prior structural knowledge about the signal. This framework generalizes the classical phase retrieval problem, which reconstructs a signal from the magnitudes of its Fourier transform, to a richer setting involving non-abelia…
▽ More
The generalized phase retrieval problem over compact groups aims to recover a set of matrices, representing an unknown signal, from their associated Gram matrices, leveraging prior structural knowledge about the signal. This framework generalizes the classical phase retrieval problem, which reconstructs a signal from the magnitudes of its Fourier transform, to a richer setting involving non-abelian compact groups. In this broader context, the unknown phases in Fourier space are replaced by unknown orthogonal matrices that arise from the action of a compact group on a finite-dimensional vector space. This problem is primarily motivated by advances in electron microscopy to determining the 3D structure of biological macromolecules from highly noisy observations. To capture realistic assumptions from machine learning and signal processing, we model the signal as belonging to one of several broad structural families: a generic linear subspace, a sparse representation in a generic basis, the output of a generic ReLU neural network, or a generic low-dimensional manifold. Our main result shows that, under mild conditions, the generalized phase retrieval problem not only admits a unique solution (up to inherent group symmetries), but also satisfies a bi-Lipschitz property. This implies robustness to both noise and model mismatch, an essential requirement for practical use, especially when measurements are severely corrupted by noise. These findings provide theoretical support for a wide class of scientific problems under modern structural assumptions, and they offer strong foundations for developing robust algorithms in high-noise regimes.
△ Less
Submitted 12 May, 2025; v1 submitted 7 May, 2025;
originally announced May 2025.
-
Provable algorithms for multi-reference alignment over $\SO(2)$
Authors:
Gil Drozatz,
Tamir Bendory,
Nir Sharon
Abstract:
The multi-reference alignment (MRA) problem involves reconstructing a signal from multiple noisy observations, each transformed by a random group element. In this paper, we focus on the group \(\mathrm{SO}(2)\) of in-plane rotations and propose two computationally efficient algorithms with theoretical guarantees for accurate signal recovery under a non-uniform distribution over the group. The firs…
▽ More
The multi-reference alignment (MRA) problem involves reconstructing a signal from multiple noisy observations, each transformed by a random group element. In this paper, we focus on the group \(\mathrm{SO}(2)\) of in-plane rotations and propose two computationally efficient algorithms with theoretical guarantees for accurate signal recovery under a non-uniform distribution over the group. The first algorithm exploits the spectral properties of the second moment of the data, while the second utilizes the frequency marching principle. Both algorithms achieve the optimal estimation rate in high-noise regimes, marking a significant advancement in the development of computationally efficient and statistically optimal methods for estimation problems over groups.
△ Less
Submitted 6 May, 2025; v1 submitted 27 April, 2025;
originally announced April 2025.
-
A note on the sample complexity of multi-target detection
Authors:
Amnon Balanov,
Shay Kreymer,
Tamir Bendory
Abstract:
This work studies the sample complexity of the multi-target detection (MTD) problem, which involves recovering a signal from a noisy measurement containing multiple instances of a target signal in unknown locations, each transformed by a random group element. This problem is primarily motivated by single-particle cryo-electron microscopy (cryo-EM), a groundbreaking technology for determining the s…
▽ More
This work studies the sample complexity of the multi-target detection (MTD) problem, which involves recovering a signal from a noisy measurement containing multiple instances of a target signal in unknown locations, each transformed by a random group element. This problem is primarily motivated by single-particle cryo-electron microscopy (cryo-EM), a groundbreaking technology for determining the structures of biological molecules. We establish upper and lower bounds for various MTD models in the high-noise regime as a function of the group, the distribution over the group, and the arrangement of signal occurrences within the measurement. The lower bounds are established through a reduction to the related multi-reference alignment problem, while the upper bounds are derived from explicit recovery algorithms utilizing autocorrelation analysis. These findings provide fundamental insights into estimation limits in noisy environments and lay the groundwork for extending this analysis to more complex applications, such as cryo-EM.
△ Less
Submitted 21 January, 2025;
originally announced January 2025.
-
The generalized phase retrieval problem over compact groups
Authors:
Tamir Bendory,
Dan Edidin
Abstract:
The classical phase retrieval problem involves estimating a signal from its Fourier magnitudes (power spectrum) by leveraging prior information about the desired signal. This paper extends the problem to compact groups, addressing the recovery of a set of matrices from their Gram matrices. In this broader context, the missing phases in Fourier space are replaced by missing unitary or orthogonal ma…
▽ More
The classical phase retrieval problem involves estimating a signal from its Fourier magnitudes (power spectrum) by leveraging prior information about the desired signal. This paper extends the problem to compact groups, addressing the recovery of a set of matrices from their Gram matrices. In this broader context, the missing phases in Fourier space are replaced by missing unitary or orthogonal matrices arising from the action of a compact group on a finite-dimensional vector space. This generalization is driven by applications in multi-reference alignment and single-particle cryo-electron microscopy, a pivotal technology in structural biology. We define the generalized phase retrieval problem over compact groups and explore its underlying algebraic structure. We survey recent results on the uniqueness of solutions, focusing on the significant class of semialgebraic priors. Furthermore, we present a family of algorithms inspired by classical phase retrieval techniques. Finally, we propose a conjecture on the stability of the problem based on bi-Lipschitz analysis, supported by numerical experiments.
△ Less
Submitted 7 January, 2025;
originally announced January 2025.
-
Bayesian Perspective for Orientation Estimation in Cryo-EM and Cryo-ET
Authors:
Sheng Xu,
Amnon Balanov,
Tamir Bendory
Abstract:
Accurate orientation estimation is a crucial component of 3D molecular structure reconstruction, both in single-particle cryo-electron microscopy (cryo-EM) and in the increasingly popular field of cryo-electron tomography (cryo-ET). The dominant method, which involves searching for an orientation with maximum cross-correlation relative to given templates, falls short, particularly in low signal-to…
▽ More
Accurate orientation estimation is a crucial component of 3D molecular structure reconstruction, both in single-particle cryo-electron microscopy (cryo-EM) and in the increasingly popular field of cryo-electron tomography (cryo-ET). The dominant method, which involves searching for an orientation with maximum cross-correlation relative to given templates, falls short, particularly in low signal-to-noise environments. In this work, we propose a Bayesian framework to develop a more accurate and flexible orientation estimation approach, with the minimum mean square error (MMSE) estimator as a key example. This method effectively accommodates varying structural conformations and arbitrary rotational distributions. Through simulations, we demonstrate that our estimator consistently outperforms the cross-correlation-based method, especially in challenging conditions with low signal-to-noise ratios, and offer a theoretical framework to support these improvements. We further show that integrating our estimator into the iterative refinement in the 3D reconstruction pipeline markedly enhances overall accuracy, revealing substantial benefits across the algorithmic workflow. Finally, we show empirically that the proposed Bayesian approach enhances robustness against the ``Einstein from Noise'' phenomenon, reducing model bias and improving reconstruction reliability. These findings indicate that the proposed Bayesian framework could substantially advance cryo-EM and cryo-ET by enhancing the accuracy, robustness, and reliability of 3D molecular structure reconstruction, thereby facilitating deeper insights into complex biological systems.
△ Less
Submitted 4 December, 2024;
originally announced December 2024.
-
Confirmation Bias in Gaussian Mixture Models
Authors:
Amnon Balanov,
Tamir Bendory,
Wasim Huleihel
Abstract:
Confirmation bias, the tendency to interpret information in a way that aligns with one's preconceptions, can profoundly impact scientific research, leading to conclusions that reflect the researcher's hypotheses even when the observational data do not support them. This issue is especially critical in scientific fields involving highly noisy observations, such as cryo-electron microscopy.
This s…
▽ More
Confirmation bias, the tendency to interpret information in a way that aligns with one's preconceptions, can profoundly impact scientific research, leading to conclusions that reflect the researcher's hypotheses even when the observational data do not support them. This issue is especially critical in scientific fields involving highly noisy observations, such as cryo-electron microscopy.
This study investigates confirmation bias in Gaussian mixture models. We consider the following experiment: A team of scientists assumes they are analyzing data drawn from a Gaussian mixture model with known signals (hypotheses) as centroids. However, in reality, the observations consist entirely of noise without any informative structure. The researchers use a single iteration of the K-means or expectation-maximization algorithms, two popular algorithms to estimate the centroids. Despite the observations being pure noise, we show that these algorithms yield biased estimates that resemble the initial hypotheses, contradicting the unbiased expectation that averaging these noise observations would converge to zero. Namely, the algorithms generate estimates that mirror the postulated model, although the hypotheses (the presumed centroids of the Gaussian mixture) are not evident in the observations. Specifically, among other results, we prove a positive correlation between the estimates produced by the algorithms and the corresponding hypotheses. We also derive explicit closed-form expressions of the estimates for a finite and infinite number of hypotheses. This study underscores the risks of confirmation bias in low signal-to-noise environments, provides insights into potential pitfalls in scientific methodologies, and highlights the importance of prudent data interpretation.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Einstein from Noise: Statistical Analysis
Authors:
Amnon Balanov,
Wasim Huleihel,
Tamir Bendory
Abstract:
``Einstein from noise" (EfN) is a prominent example of the model bias phenomenon: systematic errors in the statistical model that lead to spurious but consistent estimates. In the EfN experiment, one falsely believes that a set of observations contains noisy, shifted copies of a template signal (e.g., an Einstein image), whereas in reality, it contains only pure noise observations. To estimate the…
▽ More
``Einstein from noise" (EfN) is a prominent example of the model bias phenomenon: systematic errors in the statistical model that lead to spurious but consistent estimates. In the EfN experiment, one falsely believes that a set of observations contains noisy, shifted copies of a template signal (e.g., an Einstein image), whereas in reality, it contains only pure noise observations. To estimate the signal, the observations are first aligned with the template using cross-correlation, and then averaged. Although the observations contain nothing but noise, it was recognized early on that this process produces a signal that resembles the template signal! This pitfall was at the heart of a central scientific controversy about validation techniques in structural biology.
This paper provides a comprehensive statistical analysis of the EfN phenomenon above. We show that the Fourier phases of the EfN estimator (namely, the average of the aligned noise observations) converge to the Fourier phases of the template signal, explaining the observed structural similarity. Additionally, we prove that the convergence rate is inversely proportional to the number of noise observations and, in the high-dimensional regime, to the Fourier magnitudes of the template signal. Moreover, in the high-dimensional regime, the Fourier magnitudes converge to a scaled version of the template signal's Fourier magnitudes. This work not only deepens the theoretical understanding of the EfN phenomenon but also highlights potential pitfalls in template matching techniques and emphasizes the need for careful interpretation of noisy observations across disciplines in engineering, statistics, physics, and biology.
△ Less
Submitted 11 June, 2025; v1 submitted 7 July, 2024;
originally announced July 2024.
-
A transversality theorem for semi-algebraic sets with application to signal recovery from the second moment and cryo-EM
Authors:
Tamir Bendory,
Nadav Dym,
Dan Edidin,
Arun Suresh
Abstract:
Semi-algebraic priors are ubiquitous in signal processing and machine learning. Prevalent examples include a) linear models where the signal lies in a low-dimensional subspace; b) sparse models where the signal can be represented by only a few coefficients under a suitable basis; and c) a large family of neural network generative models. In this paper, we prove a transversality theorem for semi-al…
▽ More
Semi-algebraic priors are ubiquitous in signal processing and machine learning. Prevalent examples include a) linear models where the signal lies in a low-dimensional subspace; b) sparse models where the signal can be represented by only a few coefficients under a suitable basis; and c) a large family of neural network generative models. In this paper, we prove a transversality theorem for semi-algebraic sets in orthogonal or unitary representations of groups: with a suitable dimension bound, a generic translate of any semi-algebraic set is transverse to the orbits of the group action. This, in turn, implies that if a signal lies in a low-dimensional semi-algebraic set, then it can be recovered uniquely from measurements that separate orbits.
As an application, we consider the implications of the transversality theorem to the problem of recovering signals that are translated by random group actions from their second moment. As a special case, we discuss cryo-EM: a leading technology to constitute the spatial structure of biological molecules, which serves as our prime motivation. In particular, we derive explicit bounds for recovering a molecular structure from the second moment under a semi-algebraic prior and deduce information-theoretic implications. We also obtain information-theoretic bounds for three additional applications: factoring Gram matrices, multi-reference alignment, and phase retrieval. Finally, we deduce bounds for designing permutation invariant separators in machine learning.
△ Less
Submitted 10 June, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Object detection under the linear subspace model with application to cryo-EM images
Authors:
Amitay Eldar,
Keren Mor Waknin,
Samuel Davenport,
Tamir Bendory,
Armin Schwartzman,
Yoel Shkolnisky
Abstract:
Detecting multiple unknown objects in noisy data is a key problem in many scientific fields, such as electron microscopy imaging. A common model for the unknown objects is the linear subspace model, which assumes that the objects can be expanded in some known basis (such as the Fourier basis). In this paper, we develop an object detection algorithm that under the linear subspace model is asymptoti…
▽ More
Detecting multiple unknown objects in noisy data is a key problem in many scientific fields, such as electron microscopy imaging. A common model for the unknown objects is the linear subspace model, which assumes that the objects can be expanded in some known basis (such as the Fourier basis). In this paper, we develop an object detection algorithm that under the linear subspace model is asymptotically guaranteed to detect all objects, while controlling the family wise error rate or the false discovery rate. Numerical simulations show that the algorithm also controls the error rate with high power in the non-asymptotic regime, even in highly challenging regimes. We apply the proposed algorithm to experimental electron microscopy data set, and show that it outperforms existing standard software.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
The beltway problem over orthogonal groups
Authors:
Tamir Bendory,
Dan Edidin,
Oscar Mickelin
Abstract:
The classical beltway problem entails recovering a set of points from their unordered pairwise distances on the circle. This problem can be viewed as a special case of the crystallographic phase retrieval problem of recovering a sparse signal from its periodic autocorrelation. Based on this interpretation, and motivated by cryo-electron microscopy, we suggest a natural generalization to orthogonal…
▽ More
The classical beltway problem entails recovering a set of points from their unordered pairwise distances on the circle. This problem can be viewed as a special case of the crystallographic phase retrieval problem of recovering a sparse signal from its periodic autocorrelation. Based on this interpretation, and motivated by cryo-electron microscopy, we suggest a natural generalization to orthogonal groups: recovering a sparse signal, up to an orthogonal transformation, from its autocorrelation over the orthogonal group. If the support of the signal is collision-free, we bound the number of solutions to the beltway problem over orthogonal groups, and prove that this bound is exactly one when the support of the signal is radially collision-free (i.e., the support points have distinct magnitudes). We also prove that if the pairwise products of the signal's weights are distinct, then the autocorrelation determines the signal uniquely, up to an orthogonal transformation. We conclude the paper by considering binary signals and show that in this case, the collision-free condition need not be sufficient to determine signals up to orthogonal transformation.
△ Less
Submitted 28 July, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Image detection using combinatorial auction
Authors:
Simon Anuk,
Tamir Bendory,
Amichai Painsky
Abstract:
This paper studies the optimal solution of the classical problem of detecting the location of multiple image occurrences in a two-dimensional, noisy measurement. Assuming the image occurrences do not overlap, we formulate this task as a constrained maximum likelihood optimization problem. We show that the maximum likelihood estimator is equivalent to an instance of the winner determination problem…
▽ More
This paper studies the optimal solution of the classical problem of detecting the location of multiple image occurrences in a two-dimensional, noisy measurement. Assuming the image occurrences do not overlap, we formulate this task as a constrained maximum likelihood optimization problem. We show that the maximum likelihood estimator is equivalent to an instance of the winner determination problem from the field of combinatorial auction and that the solution can be obtained by searching over a binary tree. We then design a pruning mechanism that significantly accelerates the runtime of the search. We demonstrate on simulations and electron microscopy data sets that the proposed algorithm provides accurate detection in challenging regimes of high noise levels and densely packed image occurrences.
△ Less
Submitted 30 July, 2024; v1 submitted 21 January, 2024;
originally announced January 2024.
-
Score-based diffusion priors for multi-target detection
Authors:
Alon Zabatani,
Shay Kreymer,
Tamir Bendory
Abstract:
Multi-target detection (MTD) is the problem of estimating an image from a large, noisy measurement that contains randomly translated and rotated copies of the image. Motivated by the single-particle cryo-electron microscopy technology, we design data-driven diffusion priors for the MTD problem, derived from score-based stochastic differential equations models. We then integrate the prior into the…
▽ More
Multi-target detection (MTD) is the problem of estimating an image from a large, noisy measurement that contains randomly translated and rotated copies of the image. Motivated by the single-particle cryo-electron microscopy technology, we design data-driven diffusion priors for the MTD problem, derived from score-based stochastic differential equations models. We then integrate the prior into the approximate expectation-maximization algorithm. In particular, our method alternates between an expectation step that approximates the expected log-likelihood and a maximization step that balances the approximated log-likelihood with the learned log-prior. We show on two datasets that adding the data-driven prior substantially reduces the estimation error, in particular in high noise regimes.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
Phase retrieval with semi-algebraic and ReLU neural network priors
Authors:
Tamir Bendory,
Nadav Dym,
Dan Edidin,
Arun Suresh
Abstract:
The key ingredient to retrieving a signal from its Fourier magnitudes, namely, to solve the phase retrieval problem, is an effective prior on the sought signal. In this paper, we study the phase retrieval problem under the prior that the signal lies in a semi-algebraic set. This is a very general prior as semi-algebraic sets include linear models, sparse models, and ReLU neural network generative…
▽ More
The key ingredient to retrieving a signal from its Fourier magnitudes, namely, to solve the phase retrieval problem, is an effective prior on the sought signal. In this paper, we study the phase retrieval problem under the prior that the signal lies in a semi-algebraic set. This is a very general prior as semi-algebraic sets include linear models, sparse models, and ReLU neural network generative models. The latter is the main motivation of this paper, due to the remarkable success of deep generative models in a variety of imaging tasks, including phase retrieval. We prove that almost all signals in R^N can be determined from their Fourier magnitudes, up to a sign, if they lie in a (generic) semi-algebraic set of dimension N/2. The same is true for all signals if the semi-algebraic set is of dimension N/4. We also generalize these results to the problem of signal recovery from the second moment in multi-reference alignment models with multiplicity free representations of compact groups. This general result is then used to derive improved sample complexity bounds for recovering band-limited functions on the sphere from their noisy copies, each acted upon by a random element of SO(3).
△ Less
Submitted 29 April, 2025; v1 submitted 15 November, 2023;
originally announced November 2023.
-
$ \mathrm{SE} (3) $ Synchronization by Eigenvectors of Dual Quaternion Matrices
Authors:
Ido Hadi,
Tamir Bendory,
Nir Sharon
Abstract:
In synchronization problems, the goal is to estimate elements of a group from noisy measurements of their ratios. A popular estimation method for synchronization is the spectral method. It extracts the group elements from eigenvectors of a block matrix formed from the measurements. The eigenvectors must be projected, or "rounded", onto the group. The rounding procedures are constructed ad hoc and…
▽ More
In synchronization problems, the goal is to estimate elements of a group from noisy measurements of their ratios. A popular estimation method for synchronization is the spectral method. It extracts the group elements from eigenvectors of a block matrix formed from the measurements. The eigenvectors must be projected, or "rounded", onto the group. The rounding procedures are constructed ad hoc and increasingly so when applied to synchronization problems over non-compact groups.
In this paper, we develop a spectral approach to synchronization over the non-compact group $\mathrm{SE}(3)$, the group of rigid motions of $\mathbb{R}^3$. We based our method on embedding $\mathrm{SE}(3)$ into the algebra of dual quaternions, which has deep algebraic connections with the group $\mathrm{SE}(3)$. These connections suggest a natural rounding procedure considerably more straightforward than the current state-of-the-art for spectral $\mathrm{SE}(3)$ synchronization, which uses a matrix embedding of $\mathrm{SE}(3)$. We show by numerical experiments that our approach yields comparable results to the current state-of-the-art in $\mathrm{SE}(3)$ synchronization via the spectral method. Thus, our approach reaps the benefits of the dual quaternion embedding of $\mathrm{SE}(3)$, while yielding estimators of similar quality.
△ Less
Submitted 14 July, 2023;
originally announced July 2023.
-
Detection and Recovery of Hidden Submatrices
Authors:
Marom Dadon,
Wasim Huleihel,
Tamir Bendory
Abstract:
In this paper, we study the problems of detection and recovery of hidden submatrices with elevated means inside a large Gaussian random matrix. We consider two different structures for the planted submatrices. In the first model, the planted matrices are disjoint, and their row and column indices can be arbitrary. Inspired by scientific applications, the second model restricts the row and column i…
▽ More
In this paper, we study the problems of detection and recovery of hidden submatrices with elevated means inside a large Gaussian random matrix. We consider two different structures for the planted submatrices. In the first model, the planted matrices are disjoint, and their row and column indices can be arbitrary. Inspired by scientific applications, the second model restricts the row and column indices to be consecutive. In the detection problem, under the null hypothesis, the observed matrix is a realization of independent and identically distributed standard normal entries. Under the alternative, there exists a set of hidden submatrices with elevated means inside the same standard normal matrix. Recovery refers to the task of locating the hidden submatrices. For both problems, and for both models, we characterize the statistical and computational barriers by deriving information-theoretic lower bounds, designing and analyzing algorithms matching those bounds, and proving computational lower bounds based on the low-degree polynomials conjecture. In particular, we show that the space of the model parameters (i.e., number of planted submatrices, their dimensions, and elevated mean) can be partitioned into three regions: the impossible regime, where all algorithms fail; the hard regime, where while detection or recovery are statistically possible, we give some evidence that polynomial-time algorithm do not exist; and finally the easy regime, where polynomial-time algorithms exist.
△ Less
Submitted 4 July, 2023; v1 submitted 11 June, 2023;
originally announced June 2023.
-
A stochastic approximate expectation-maximization for structure determination directly from cryo-EM micrographs
Authors:
Shay Kreymer,
Amit Singer,
Tamir Bendory
Abstract:
A single-particle cryo-electron microscopy (cryo-EM) measurement, called a micrograph, consists of multiple two-dimensional tomographic projections of a three-dimensional molecular structure at unknown locations, taken under unknown viewing directions. All existing cryo-EM algorithmic pipelines first locate and extract the projection images, and then reconstruct the structure from the extracted im…
▽ More
A single-particle cryo-electron microscopy (cryo-EM) measurement, called a micrograph, consists of multiple two-dimensional tomographic projections of a three-dimensional molecular structure at unknown locations, taken under unknown viewing directions. All existing cryo-EM algorithmic pipelines first locate and extract the projection images, and then reconstruct the structure from the extracted images. However, if the molecular structure is small, the signal-to-noise ratio (SNR) of the data is very low, and thus accurate detection of projection images within the micrograph is challenging. Consequently, all standard techniques fail in low-SNR regimes. To recover molecular structures from measurements of low SNR, and in particular small molecular structures, we devise a stochastic approximate expectation-maximization algorithm to estimate the three-dimensional structure directly from the micrograph, bypassing locating the projection images. We corroborate our computational scheme with numerical experiments, and present successful structure recoveries from simulated noisy measurements.
△ Less
Submitted 24 February, 2023;
originally announced March 2023.
-
Finite alphabet phase retrieval
Authors:
Tamir Bendory,
Dan Edidin,
Ivan Gonzalez
Abstract:
We consider the finite alphabet phase retrieval problem: recovering a signal whose entries lie in a small alphabet of possible values from its Fourier magnitudes. This problem arises in the celebrated technology of X-ray crystallography to determine the atomic structure of biological molecules. Our main result states that for generic values of the alphabet, two signals have the same Fourier magnit…
▽ More
We consider the finite alphabet phase retrieval problem: recovering a signal whose entries lie in a small alphabet of possible values from its Fourier magnitudes. This problem arises in the celebrated technology of X-ray crystallography to determine the atomic structure of biological molecules. Our main result states that for generic values of the alphabet, two signals have the same Fourier magnitudes if and only if several partitions have the same difference sets. Thus, the finite alphabet phase retrieval problem reduces to the combinatorial problem of determining a signal from those difference sets. Notably, this result holds true when one of the letters of the alphabet is zero, namely, for sparse signals with finite alphabet, which is the situation in X-ray crystallography.
△ Less
Submitted 7 April, 2023; v1 submitted 25 January, 2023;
originally announced January 2023.
-
Signal enhancement for two-dimensional cryo-EM data processing
Authors:
Guy Sharon,
Yoel Shkolnisky,
Tamir Bendory
Abstract:
Different tasks in the computational pipeline of single-particle cryo-electron microscopy (cryo-EM) require enhancing the quality of the highly noisy raw images. To this end, we develop an efficient algorithm for signal enhancement of cryo-EM images. The enhanced images can be used for a variety of downstream tasks, such as 2-D classification, removing uninformative images, constructing {ab initio…
▽ More
Different tasks in the computational pipeline of single-particle cryo-electron microscopy (cryo-EM) require enhancing the quality of the highly noisy raw images. To this end, we develop an efficient algorithm for signal enhancement of cryo-EM images. The enhanced images can be used for a variety of downstream tasks, such as 2-D classification, removing uninformative images, constructing {ab initio} models, generating templates for particle picking, providing a quick assessment of the data set, dimensionality reduction, and symmetry detection. The algorithm includes built-in quality measures to assess its performance and alleviate the risk of model bias. We demonstrate the effectiveness of the proposed algorithm on several experimental data sets. In particular, we show that the quality of the resulting images is high enough to produce ab initio models of $\sim 10$ Åresolution. The algorithm is accompanied by a publicly available, documented and easy-to-use code.
△ Less
Submitted 2 December, 2022;
originally announced December 2022.
-
The sample complexity of sparse multi-reference alignment and single-particle cryo-electron microscopy
Authors:
Tamir Bendory,
Dan Edidin
Abstract:
Multi-reference alignment (MRA) is the problem of recovering a signal from its multiple noisy copies, each acted upon by a random group element. MRA is mainly motivated by single-particle cryo-electron microscopy (cryo-EM) that has recently joined X-ray crystallography as one of the two leading technologies to reconstruct biological molecular structures. Previous papers have shown that in the high…
▽ More
Multi-reference alignment (MRA) is the problem of recovering a signal from its multiple noisy copies, each acted upon by a random group element. MRA is mainly motivated by single-particle cryo-electron microscopy (cryo-EM) that has recently joined X-ray crystallography as one of the two leading technologies to reconstruct biological molecular structures. Previous papers have shown that in the high noise regime, the sample complexity of MRA and cryo-EM is $n=ω(σ^{2d})$, where $n$ is the number of observations, $σ^2$ is the variance of the noise, and $d$ is the lowest-order moment of the observations that uniquely determines the signal. In particular, it was shown that in many cases, $d=3$ for generic signals, and thus the sample complexity is $n=ω(σ^6)$.
In this paper, we analyze the second moment of the MRA and cryo-EM models. First, we show that in both models the second moment determines the signal up to a set of unitary matrices, whose dimension is governed by the decomposition of the space of signals into irreducible representations of the group. Second, we derive sparsity conditions under which a signal can be recovered from the second moment, implying sample complexity of $n=ω(σ^4)$. Notably, we show that the sample complexity of cryo-EM is $n=ω(σ^4)$ if at most one third of the coefficients representing the molecular structure are non-zero; this bound is near-optimal. The analysis is based on tools from representation theory and algebraic geometry. We also derive bounds on recovering a sparse signal from its power spectrum, which is the main computational problem of X-ray crystallography.
△ Less
Submitted 14 August, 2023; v1 submitted 27 October, 2022;
originally announced October 2022.
-
Unsupervised particle sorting for cryo-EM using probabilistic PCA
Authors:
Gili Weiss-Dicker,
Amitay Eldar,
Yoel Shkolinsky,
Tamir Bendory
Abstract:
Single-particle cryo-electron microscopy (cryo-EM) is a leading technology to resolve the structure of molecules. Early in the process, the user detects potential particle images in the raw data. Typically, there are many false detections as a result of high levels of noise and contamination. Currently, removing the false detections requires human intervention to sort the hundred thousands of imag…
▽ More
Single-particle cryo-electron microscopy (cryo-EM) is a leading technology to resolve the structure of molecules. Early in the process, the user detects potential particle images in the raw data. Typically, there are many false detections as a result of high levels of noise and contamination. Currently, removing the false detections requires human intervention to sort the hundred thousands of images. We propose a statistically-established unsupervised algorithm to remove non-particle images. We model the particle images as a union of low-dimensional subspaces, assuming non-particle images are arbitrarily scattered in the high-dimensional space. The algorithm is based on an extension of the probabilistic PCA framework to robustly learn a non-linear model of union of subspaces. This provides a flexible model for cryo-EM data, and allows to automatically remove images that correspond to pure noise and contamination. Numerical experiments corroborate the effectiveness of the sorting algorithm.
△ Less
Submitted 7 March, 2023; v1 submitted 23 October, 2022;
originally announced October 2022.
-
K-sample Multiple Hypothesis Testing for Signal Detection
Authors:
Uriel Shiterburd,
Tamir Bendory,
Amichai Painsky
Abstract:
This paper studies the classical problem of estimating the locations of signal occurrences in a noisy measurement. Based on a multiple hypothesis testing scheme, we design a K-sample statistical test to control the false discovery rate (FDR). Specifically, we first convolve the noisy measurement with a smoothing kernel, and find all local maxima. Then, we evaluate the joint probability of K entrie…
▽ More
This paper studies the classical problem of estimating the locations of signal occurrences in a noisy measurement. Based on a multiple hypothesis testing scheme, we design a K-sample statistical test to control the false discovery rate (FDR). Specifically, we first convolve the noisy measurement with a smoothing kernel, and find all local maxima. Then, we evaluate the joint probability of K entries in the vicinity of each local maximum, derive the corresponding p-value, and apply the Benjamini-Hochberg procedure to account for multiplicity. We demonstrate through extensive experiments that our proposed method, with K=2, controls the prescribed FDR while increasing the power compared to a one-sample test.
△ Less
Submitted 23 September, 2022;
originally announced September 2022.
-
Autocorrelation analysis for cryo-EM with sparsity constraints: Improved sample complexity and projection-based algorithms
Authors:
Tamir Bendory,
Yuehaw Khoo,
Joe Kileel,
Oscar Mickelin,
Amit Singer
Abstract:
The number of noisy images required for molecular reconstruction in single-particle cryo-electron microscopy (cryo-EM) is governed by the autocorrelations of the observed, randomly-oriented, noisy projection images. In this work, we consider the effect of imposing sparsity priors on the molecule. We use techniques from signal processing, optimization, and applied algebraic geometry to obtain new t…
▽ More
The number of noisy images required for molecular reconstruction in single-particle cryo-electron microscopy (cryo-EM) is governed by the autocorrelations of the observed, randomly-oriented, noisy projection images. In this work, we consider the effect of imposing sparsity priors on the molecule. We use techniques from signal processing, optimization, and applied algebraic geometry to obtain new theoretical and computational contributions for this challenging non-linear inverse problem with sparsity constraints. We prove that molecular structures modeled as sums of Gaussians are uniquely determined by the second-order autocorrelation of their projection images, implying that the sample complexity is proportional to the square of the variance of the noise. This theory improves upon the non-sparse case, where the third-order autocorrelation is required for uniformly-oriented particle images and the sample complexity scales with the cube of the noise variance. Furthermore, we build a computational framework to reconstruct molecular structures which are sparse in the wavelet basis. This method combines the sparse representation for the molecule with projection-based techniques used for phase retrieval in X-ray crystallography.
△ Less
Submitted 1 May, 2023; v1 submitted 21 September, 2022;
originally announced September 2022.
-
Detecting non-overlapping signals with dynamic programming
Authors:
Mordechai Roth,
Amichai Painsky,
Tamir Bendory
Abstract:
This paper studies the classical problem of detecting the locations of signal occurrences in a one-dimensional noisy measurement. Assuming the signal occurrences do not overlap, we formulate the detection task as a constrained likelihood optimization problem, and design a computationally efficient dynamic program that attains its optimal solution. Our proposed framework is scalable, simple to impl…
▽ More
This paper studies the classical problem of detecting the locations of signal occurrences in a one-dimensional noisy measurement. Assuming the signal occurrences do not overlap, we formulate the detection task as a constrained likelihood optimization problem, and design a computationally efficient dynamic program that attains its optimal solution. Our proposed framework is scalable, simple to implement, and robust to model uncertainties. We show by extensive numerical experiments that our algorithm accurately estimates the locations in dense and noisy environments, and outperforms alternative methods.
△ Less
Submitted 17 February, 2023; v1 submitted 16 August, 2022;
originally announced August 2022.
-
Unrolled algorithms for group synchronization
Authors:
Noam Janco,
Tamir Bendory
Abstract:
The group synchronization problem involves estimating a collection of group elements from noisy measurements of their pairwise ratios. This task is a key component in many computational problems, including the molecular reconstruction problem in single-particle cryo-electron microscopy (cryo-EM). The standard methods to estimate the group elements are based on iteratively applying linear and non-l…
▽ More
The group synchronization problem involves estimating a collection of group elements from noisy measurements of their pairwise ratios. This task is a key component in many computational problems, including the molecular reconstruction problem in single-particle cryo-electron microscopy (cryo-EM). The standard methods to estimate the group elements are based on iteratively applying linear and non-linear operators, and are not necessarily optimal. Motivated by the structural similarity to deep neural networks, we adopt the concept of algorithm unrolling, where training data is used to optimize the algorithm. We design unrolled algorithms for several group synchronization instances, including synchronization over the group of 3-D rotations: the synchronization problem in cryo-EM. We also apply a similar approach to the multi-reference alignment problem. We show by numerical experiments that the unrolling strategy outperforms existing synchronization algorithms in a wide variety of scenarios.
△ Less
Submitted 8 December, 2022; v1 submitted 19 July, 2022;
originally announced July 2022.
-
Denoiser-based projections for 2-D super-resolution multi-reference alignment
Authors:
Jonathan Shani,
Tom Tirer,
Raja Giryes,
Tamir Bendory
Abstract:
We study the 2-D super-resolution multi-reference alignment (SR-MRA) problem: estimating an image from its down-sampled, circularly-translated, and noisy copies. The SR-MRA problem serves as a mathematical abstraction of the structure determination problem for biological molecules. Since the SR-MRA problem is ill-posed without prior knowledge, accurate image estimation relies on designing priors t…
▽ More
We study the 2-D super-resolution multi-reference alignment (SR-MRA) problem: estimating an image from its down-sampled, circularly-translated, and noisy copies. The SR-MRA problem serves as a mathematical abstraction of the structure determination problem for biological molecules. Since the SR-MRA problem is ill-posed without prior knowledge, accurate image estimation relies on designing priors that well-describe the statistics of the images of interest. In this work, we build on recent advances in image processing, and harness the power of denoisers as priors of images. In particular, we suggest to use denoisers as projections, and design two computational frameworks to estimate the image: projected expectation-maximization and projected method of moments. We provide an efficient GPU implementation, and demonstrate the effectiveness of these algorithms by extensive numerical experiments on a wide range of parameters and images.
△ Less
Submitted 2 May, 2024; v1 submitted 10 April, 2022;
originally announced April 2022.
-
Algebraic theory of phase retrieval
Authors:
Tamir Bendory,
Dan Edidin
Abstract:
The purpose of this article is to discuss recent advances in the growing field of phase retrieval, and to publicize open problems that we believe will be of interest to mathematicians in general, and algebraists in particular.
The purpose of this article is to discuss recent advances in the growing field of phase retrieval, and to publicize open problems that we believe will be of interest to mathematicians in general, and algebraists in particular.
△ Less
Submitted 5 March, 2022;
originally announced March 2022.
-
On the Role of Channel Capacity in Learning Gaussian Mixture Models
Authors:
Elad Romanov,
Tamir Bendory,
Or Ordentlich
Abstract:
This paper studies the sample complexity of learning the $k$ unknown centers of a balanced Gaussian mixture model (GMM) in $\mathbb{R}^d$ with spherical covariance matrix $σ^2\mathbf{I}$. In particular, we are interested in the following question: what is the maximal noise level $σ^2$, for which the sample complexity is essentially the same as when estimating the centers from labeled measurements?…
▽ More
This paper studies the sample complexity of learning the $k$ unknown centers of a balanced Gaussian mixture model (GMM) in $\mathbb{R}^d$ with spherical covariance matrix $σ^2\mathbf{I}$. In particular, we are interested in the following question: what is the maximal noise level $σ^2$, for which the sample complexity is essentially the same as when estimating the centers from labeled measurements? To that end, we restrict attention to a Bayesian formulation of the problem, where the centers are uniformly distributed on the sphere $\sqrt{d}\mathcal{S}^{d-1}$. Our main results characterize the exact noise threshold $σ^2$ below which the GMM learning problem, in the large system limit $d,k\to\infty$, is as easy as learning from labeled observations, and above which it is substantially harder. The threshold occurs at $\frac{\log k}{d} = \frac12\log\left( 1+\frac{1}{σ^2} \right)$, which is the capacity of the additive white Gaussian noise (AWGN) channel. Thinking of the set of $k$ centers as a code, this noise threshold can be interpreted as the largest noise level for which the error probability of the code over the AWGN channel is small. Previous works on the GMM learning problem have identified the minimum distance between the centers as a key parameter in determining the statistical difficulty of learning the corresponding GMM. While our results are only proved for GMMs whose centers are uniformly distributed over the sphere, they hint that perhaps it is the decoding error probability associated with the center constellation as a channel code that determines the statistical difficulty of learning the corresponding GMM, rather than just the minimum distance.
△ Less
Submitted 14 June, 2022; v1 submitted 15 February, 2022;
originally announced February 2022.
-
Near-optimal bounds for signal recovery from blind phaseless periodic short-time Fourier transform
Authors:
Tamir Bendory,
Chi-yu Cheng,
Dan Edidin
Abstract:
We study the problem of recovering a signal $x\in\mathbb{C}^N$ from samples of its phaseless periodic short-time Fourier transform (STFT): the magnitude of the Fourier transform of the signal multiplied by a sliding window $w\in \mathbb{C}^W$. We show that if the window $w$ is known, then a generic signal can be recovered, up to a global phase, from less than 4N phaseless STFT measurements. In the…
▽ More
We study the problem of recovering a signal $x\in\mathbb{C}^N$ from samples of its phaseless periodic short-time Fourier transform (STFT): the magnitude of the Fourier transform of the signal multiplied by a sliding window $w\in \mathbb{C}^W$. We show that if the window $w$ is known, then a generic signal can be recovered, up to a global phase, from less than 4N phaseless STFT measurements. In the blind case, when the window is unknown, we show that the signal and the window can be determined simultaneously, up to a group of unavoidable ambiguities, from less than 4N+2W measurements. In both cases, our bounds are optimal, up to a constant smaller than two.
△ Less
Submitted 22 September, 2022; v1 submitted 6 December, 2021;
originally announced December 2021.
-
An approximate expectation-maximization for two-dimensional multi-target detection
Authors:
Shay Kreymer,
Amit Singer,
Tamir Bendory
Abstract:
We consider the two-dimensional multi-target detection (MTD) problem of estimating a target image from a noisy measurement that contains multiple copies of the image, each randomly rotated and translated. The MTD model serves as a mathematical abstraction of the structure reconstruction problem in single-particle cryo-electron microscopy, the chief motivation of this study. We focus on high noise…
▽ More
We consider the two-dimensional multi-target detection (MTD) problem of estimating a target image from a noisy measurement that contains multiple copies of the image, each randomly rotated and translated. The MTD model serves as a mathematical abstraction of the structure reconstruction problem in single-particle cryo-electron microscopy, the chief motivation of this study. We focus on high noise regimes, where accurate detection of image occurrences within a measurement is impossible. To estimate the image, we develop an expectation-maximization framework that aims to maximize an approximation of the likelihood function. We demonstrate image recovery in highly noisy environments, and show that our framework outperforms the previously studied autocorrelation analysis in a wide range of parameters.
△ Less
Submitted 29 March, 2022; v1 submitted 5 October, 2021;
originally announced October 2021.
-
Generalized autocorrelation analysis for multi-target detection
Authors:
Ye'Ela Shalit,
Ran Weber,
Asaf Abas,
Shay Kreymer,
Tamir Bendory
Abstract:
We study the multi-target detection problem of recovering a target signal from a noisy measurement that contains multiple copies of the signal at unknown locations. Motivated by the structure reconstruction problem in cryo-electron microscopy, we focus on the high noise regime, where noise hampers accurate detection of signal occurrences. Previous works proposed an autocorrelation analysis framewo…
▽ More
We study the multi-target detection problem of recovering a target signal from a noisy measurement that contains multiple copies of the signal at unknown locations. Motivated by the structure reconstruction problem in cryo-electron microscopy, we focus on the high noise regime, where noise hampers accurate detection of signal occurrences. Previous works proposed an autocorrelation analysis framework to estimate the signal directly from the measurement, without detecting signal occurrences. Specifically, autocorrelation analysis entails finding a signal that best matches the observable autocorrelations by minimizing a least squares objective. This paper extends this line of research by developing a generalized autocorrelation analysis framework that replaces the least squares by a weighted least squares. The optimal weights can be computed directly from the data and guarantee favorable statistical properties. We demonstrate signal recovery from highly noisy measurements, and show that the proposed framework outperforms autocorrelation analysis in a wide range of parameters.
△ Less
Submitted 24 September, 2021;
originally announced September 2021.
-
Sparse multi-reference alignment: sample complexity and computational hardness
Authors:
Tamir Bendory,
Oscar Mickelin,
Amit Singer
Abstract:
Motivated by the problem of determining the atomic structure of macromolecules using single-particle cryo-electron microscopy (cryo-EM), we study the sample and computational complexities of the sparse multi-reference alignment (MRA) model: the problem of estimating a sparse signal from its noisy, circularly shifted copies. Based on its tight connection to the crystallographic phase retrieval prob…
▽ More
Motivated by the problem of determining the atomic structure of macromolecules using single-particle cryo-electron microscopy (cryo-EM), we study the sample and computational complexities of the sparse multi-reference alignment (MRA) model: the problem of estimating a sparse signal from its noisy, circularly shifted copies. Based on its tight connection to the crystallographic phase retrieval problem, we establish that if the number of observations is proportional to the square of the variance of the noise, then the sparse MRA problem is statistically feasible for sufficiently sparse signals. To investigate its computational hardness, we consider three types of computational frameworks: projection-based algorithms, bispectrum inversion, and convex relaxations. We show that a state-of-the-art projection-based algorithm achieves the optimal estimation rate, but its computational complexity is exponential in the sparsity level. The bispectrum framework provides a statistical-computational trade-off: it requires more observations (so its estimation rate is suboptimal), but its computational load is provably polynomial in the signal's length. The convex relaxation approach provides polynomial time algorithms (with a large exponent) that recover sufficiently sparse signals at the optimal estimation rate. We conclude the paper by discussing potential statistical and algorithmic implications for cryo-EM.
△ Less
Submitted 23 September, 2021;
originally announced September 2021.
-
Dihedral multi-reference alignment
Authors:
Tamir Bendory,
Dan Edidin,
William Leeb,
Nir Sharon
Abstract:
We study the dihedral multi-reference alignment problem of estimating the orbit of a signal from multiple noisy observations of the signal, acted on by random elements of the dihedral group. We show that if the group elements are drawn from a generic distribution, the orbit of a generic signal is uniquely determined from the second moment of the observations. This implies that the optimal estimati…
▽ More
We study the dihedral multi-reference alignment problem of estimating the orbit of a signal from multiple noisy observations of the signal, acted on by random elements of the dihedral group. We show that if the group elements are drawn from a generic distribution, the orbit of a generic signal is uniquely determined from the second moment of the observations. This implies that the optimal estimation rate in the high noise regime is proportional to the square of the variance of the noise. This is the first result of this type for multi-reference alignment over a non-abelian group with a non-uniform distribution of group elements. Based on tools from invariant theory and algebraic geometry, we also delineate conditions for unique orbit recovery for multi-reference alignment models over finite groups (namely, when the dihedral group is replaced by a general finite group) when the group elements are drawn from a generic distribution. Finally, we design and study numerically three computational frameworks for estimating the signal based on group synchronization, expectation-maximization, and the method of moments.
△ Less
Submitted 4 January, 2022; v1 submitted 12 July, 2021;
originally announced July 2021.
-
Compactification of the Rigid Motions Group in Image Processing
Authors:
Tamir Bendory,
Ido Hadi,
Nir Sharon
Abstract:
Image processing problems in general, and in particular in the field of single-particle cryo-electron microscopy, often require considering images up to their rotations and translations. Such problems were tackled successfully when considering images up to rotations only, using quantities which are invariant to the action of rotations on images. Extending these methods to cases where translations…
▽ More
Image processing problems in general, and in particular in the field of single-particle cryo-electron microscopy, often require considering images up to their rotations and translations. Such problems were tackled successfully when considering images up to rotations only, using quantities which are invariant to the action of rotations on images. Extending these methods to cases where translations are involved is more complicated. Here we present a computationally feasible and theoretically sound approximate invariant to the action of rotations and translations on images. It allows one to approximately reduce image processing problems to similar problems over the sphere, a compact domain acted on by the group of 3D rotations, a compact group. We show that this invariant is induced by a family of mappings deforming, and thereby compactifying, the group structure of rotations and translations of the plane, i.e., the group of rigid motions, into the group of 3D rotations. Furthermore, we demonstrate its viability in two image processing tasks: multi-reference alignment and classification. To our knowledge, this is the first instance of a quantity that is either exactly or approximately invariant to rotations and translations of images that both rests on a sound theoretical foundation and also applicable in practice.
△ Less
Submitted 2 February, 2022; v1 submitted 25 June, 2021;
originally announced June 2021.
-
An accelerated expectation-maximization algorithm for multi-reference alignment
Authors:
Noam Janco,
Tamir Bendory
Abstract:
The multi-reference alignment (MRA) problem entails estimating an image from multiple noisy and rotated copies of itself. If the noise level is low, one can reconstruct the image by estimating the missing rotations, aligning the images, and averaging out the noise. While accurate rotation estimation is impossible if the noise level is high, the rotations can still be approximated, and thus can pro…
▽ More
The multi-reference alignment (MRA) problem entails estimating an image from multiple noisy and rotated copies of itself. If the noise level is low, one can reconstruct the image by estimating the missing rotations, aligning the images, and averaging out the noise. While accurate rotation estimation is impossible if the noise level is high, the rotations can still be approximated, and thus can provide indispensable information. In particular, learning the approximation error can be harnessed for efficient image estimation. In this paper, we propose a new computational framework, called Synch-EM, that consists of angular synchronization followed by expectation-maximization (EM). The synchronization step results in a concentrated distribution of rotations; this distribution is learned and then incorporated into the EM as a Bayesian prior. The learned distribution also dramatically reduces the search space, and thus the computational load, of the EM iterations. We show by extensive numerical experiments that the proposed framework can significantly accelerate EM for MRA in high noise levels, occasionally by a few orders of magnitude, without degrading the reconstruction quality.
△ Less
Submitted 15 June, 2022; v1 submitted 16 May, 2021;
originally announced May 2021.
-
Two-dimensional multi-target detection: an autocorrelation analysis approach
Authors:
Shay Kreymer,
Tamir Bendory
Abstract:
We consider the two-dimensional multi-target detection problem of recovering a target image from a noisy measurement that contains multiple copies of the image, each randomly rotated and translated. Motivated by the structure reconstruction problem in single-particle cryo-electron microscopy, we focus on the high noise regime, where the noise hampers accurate detection of the image occurrences. We…
▽ More
We consider the two-dimensional multi-target detection problem of recovering a target image from a noisy measurement that contains multiple copies of the image, each randomly rotated and translated. Motivated by the structure reconstruction problem in single-particle cryo-electron microscopy, we focus on the high noise regime, where the noise hampers accurate detection of the image occurrences. We develop an autocorrelation analysis framework to estimate the image directly from a measurement with an arbitrary spacing distribution of image occurrences, bypassing the estimation of individual locations and rotations. We conduct extensive numerical experiments, and demonstrate image recovery in highly noisy environments. The code to reproduce all numerical experiments is publicly available at https://github.com/krshay/MTD-2D.
△ Less
Submitted 17 January, 2022; v1 submitted 14 May, 2021;
originally announced May 2021.
-
The generalized method of moments for multi-reference alignment
Authors:
Asaf Abas,
Tamir Bendory,
Nir Sharon
Abstract:
This paper studies the application of the generalized method of moments (GMM) to multi-reference alignment (MRA): the problem of estimating a signal from its circularly-translated and noisy copies. We begin by proving that the GMM estimator maintains its asymptotic optimality for statistical models with group symmetry, including MRA. Then, we conduct a comprehensive numerical study and show that t…
▽ More
This paper studies the application of the generalized method of moments (GMM) to multi-reference alignment (MRA): the problem of estimating a signal from its circularly-translated and noisy copies. We begin by proving that the GMM estimator maintains its asymptotic optimality for statistical models with group symmetry, including MRA. Then, we conduct a comprehensive numerical study and show that the GMM substantially outperforms the classical method of moments, whose application to MRA has been studied thoroughly in the literature. We also formulate the GMM to estimate a three-dimensional molecular structure using cryo-electron microscopy and present numerical results on simulated data.
△ Less
Submitted 26 September, 2021; v1 submitted 3 March, 2021;
originally announced March 2021.
-
Signal recovery from a few linear measurements of its high-order spectra
Authors:
Tamir Bendory,
Dan Edidin,
Shay Kreymer
Abstract:
The $q$-th order spectrum is a polynomial of degree $q$ in the entries of a signal $x\in\mathbb{C}^N$, which is invariant under circular shifts of the signal. For $q\geq 3$, this polynomial determines the signal uniquely, up to a circular shift, and is called a high-order spectrum. The high-order spectra, and in particular the bispectrum ($q=3$) and the trispectrum ($q=4$), play a prominent role i…
▽ More
The $q$-th order spectrum is a polynomial of degree $q$ in the entries of a signal $x\in\mathbb{C}^N$, which is invariant under circular shifts of the signal. For $q\geq 3$, this polynomial determines the signal uniquely, up to a circular shift, and is called a high-order spectrum. The high-order spectra, and in particular the bispectrum ($q=3$) and the trispectrum ($q=4$), play a prominent role in various statistical signal processing and imaging applications, such as phase retrieval and single-particle reconstruction. However, the dimension of the $q$-th order spectrum is $N^{q-1}$, far exceeding the dimension of $x$, leading to increased computational load and storage requirements. In this work, we show that it is unnecessary to store and process the full high-order spectra: a signal can be characterized uniquely, up to symmetries, from only $N+1$ linear measurements of its high-order spectra. The proof relies on tools from algebraic geometry and is corroborated by numerical experiments.
△ Less
Submitted 31 August, 2021; v1 submitted 2 March, 2021;
originally announced March 2021.
-
Multi-target detection with rotations
Authors:
Tamir Bendory,
Ti-Yen Lan,
Nicholas F. Marshall,
Iris Rukshin,
Amit Singer
Abstract:
We consider the multi-target detection problem of estimating a two-dimensional target image from a large noisy measurement image that contains many randomly rotated and translated copies of the target image. Motivated by single-particle cryo-electron microscopy, we focus on the low signal-to-noise regime, where it is difficult to estimate the locations and orientations of the target images in the…
▽ More
We consider the multi-target detection problem of estimating a two-dimensional target image from a large noisy measurement image that contains many randomly rotated and translated copies of the target image. Motivated by single-particle cryo-electron microscopy, we focus on the low signal-to-noise regime, where it is difficult to estimate the locations and orientations of the target images in the measurement. Our approach uses autocorrelation analysis to estimate rotationally and translationally invariant features of the target image. We demonstrate that, regardless of the level of noise, our technique can be used to recover the target image when the measurement is sufficiently large.
△ Less
Submitted 2 September, 2022; v1 submitted 19 January, 2021;
originally announced January 2021.
-
Multi-reference alignment in high dimensions: sample complexity and phase transition
Authors:
Elad Romanov,
Tamir Bendory,
Or Ordentlich
Abstract:
Multi-reference alignment entails estimating a signal in $\mathbb{R}^L$ from its circularly-shifted and noisy copies. This problem has been studied thoroughly in recent years, focusing on the finite-dimensional setting (fixed $L$). Motivated by single-particle cryo-electron microscopy, we analyze the sample complexity of the problem in the high-dimensional regime $L\to\infty$. Our analysis uncover…
▽ More
Multi-reference alignment entails estimating a signal in $\mathbb{R}^L$ from its circularly-shifted and noisy copies. This problem has been studied thoroughly in recent years, focusing on the finite-dimensional setting (fixed $L$). Motivated by single-particle cryo-electron microscopy, we analyze the sample complexity of the problem in the high-dimensional regime $L\to\infty$. Our analysis uncovers a phase transition phenomenon governed by the parameter $α= L/(σ^2\log L)$, where $σ^2$ is the variance of the noise. When $α>2$, the impact of the unknown circular shifts on the sample complexity is minor. Namely, the number of measurements required to achieve a desired accuracy $\varepsilon$ approaches $σ^2/\varepsilon$ for small $\varepsilon$; this is the sample complexity of estimating a signal in additive white Gaussian noise, which does not involve shifts. In sharp contrast, when $α\leq 2$, the problem is significantly harder and the sample complexity grows substantially quicker with $σ^2$.
△ Less
Submitted 30 September, 2021; v1 submitted 22 July, 2020;
originally announced July 2020.
-
Super-resolution multi-reference alignment
Authors:
Tamir Bendory,
Ariel Jaffe,
William Leeb,
Nir Sharon,
Amit Singer
Abstract:
We study super-resolution multi-reference alignment, the problem of estimating a signal from many circularly shifted, down-sampled, and noisy observations. We focus on the low SNR regime, and show that a signal in $\mathbb{R}^M$ is uniquely determined when the number $L$ of samples per observation is of the order of the square root of the signal's length $(L=O(\sqrt{M}))$. Phrased more informally,…
▽ More
We study super-resolution multi-reference alignment, the problem of estimating a signal from many circularly shifted, down-sampled, and noisy observations. We focus on the low SNR regime, and show that a signal in $\mathbb{R}^M$ is uniquely determined when the number $L$ of samples per observation is of the order of the square root of the signal's length $(L=O(\sqrt{M}))$. Phrased more informally, one can square the resolution. This result holds if the number of observations is proportional to at least 1/SNR$^3$. In contrast, with fewer observations recovery is impossible even when the observations are not down-sampled ($L=M$). The analysis combines tools from statistical signal processing and invariant theory. We design an expectation-maximization algorithm and demonstrate that it can super-resolve the signal in challenging SNR regimes.
△ Less
Submitted 9 November, 2020; v1 submitted 27 June, 2020;
originally announced June 2020.
-
Toward a mathematical theory of the crystallographic phase retrieval problem
Authors:
Tamir Bendory,
Dan Edidin
Abstract:
Motivated by the X-ray crystallography technology to determine the atomic structure of biological molecules, we study the crystallographic phase retrieval problem, arguably the leading and hardest phase retrieval setup. This problem entails recovering a K-sparse signal of length N from its Fourier magnitude or, equivalently, from its periodic auto-correlation. Specifically, this work focuses on th…
▽ More
Motivated by the X-ray crystallography technology to determine the atomic structure of biological molecules, we study the crystallographic phase retrieval problem, arguably the leading and hardest phase retrieval setup. This problem entails recovering a K-sparse signal of length N from its Fourier magnitude or, equivalently, from its periodic auto-correlation. Specifically, this work focuses on the fundamental question of uniqueness: what is the maximal sparsity level K/N that allows unique mapping between a signal and its Fourier magnitude, up to intrinsic symmetries. We design a systemic computational technique to affirm uniqueness for any specific pair (K,N), and establish the following conjecture: the Fourier magnitude determines a generic signal uniquely, up to intrinsic symmetries, as long as K<=N/2. Based on group-theoretic considerations and an additional computational technique, we formulate a second conjecture: if K<N/2, then for any signal the set of solutions to the crystallographic phase retrieval problem has measure zero in the set of all signals with a given Fourier magnitude. Together, these conjectures constitute the first attempt to establish a mathematical theory for the crystallographic phase retrieval problem.
△ Less
Submitted 2 July, 2020; v1 submitted 24 February, 2020;
originally announced February 2020.
-
A note on Douglas-Rachford, gradients, and phase retrieval
Authors:
Eitan Levin,
Tamir Bendory
Abstract:
The properties of gradient techniques for the phase retrieval problem have received a considerable attention in recent years. In almost all applications, however, the phase retrieval problem is solved using a family of algorithms that can be interpreted as variants of Douglas-Rachford splitting. In this work, we establish a connection between Douglas-Rachford and gradient algorithms. Specifically,…
▽ More
The properties of gradient techniques for the phase retrieval problem have received a considerable attention in recent years. In almost all applications, however, the phase retrieval problem is solved using a family of algorithms that can be interpreted as variants of Douglas-Rachford splitting. In this work, we establish a connection between Douglas-Rachford and gradient algorithms. Specifically, we show that in some cases a generalization of Douglas-Rachford, called relaxed-reflect-reflect (RRR), can be viewed as gradient descent on a certain objective function. The solutions coincide with the critical points of that objective, which---in contrast to standard gradient techniques---are not its minimizers. Using the objective function, we give simple proofs of some basic properties of the RRR algorithm. Specifically, we describe its set of solutions, show a local convexity around any solution, and derive stability guarantees. Nevertheless, in its present state, the analysis does not elucidate the remarkable empirical performance of RRR and its global properties.
△ Less
Submitted 4 June, 2020; v1 submitted 29 November, 2019;
originally announced November 2019.
-
Unsupervised particle sorting for high-resolution single-particle cryo-EM
Authors:
Ye Zhou,
Amit Moscovich,
Tamir Bendory,
Alberto Bartesaghi
Abstract:
Single-particle cryo-Electron Microscopy (EM) has become a popular technique for determining the structure of challenging biomolecules that are inaccessible to other technologies. Recent advances in automation, both in data collection and data processing, have significantly lowered the barrier for non-expert users to successfully execute the structure determination workflow. Many critical data pro…
▽ More
Single-particle cryo-Electron Microscopy (EM) has become a popular technique for determining the structure of challenging biomolecules that are inaccessible to other technologies. Recent advances in automation, both in data collection and data processing, have significantly lowered the barrier for non-expert users to successfully execute the structure determination workflow. Many critical data processing steps, however, still require expert user intervention in order to converge to the correct high-resolution structure. In particular, strategies to identify homogeneous populations of particles rely heavily on subjective criteria that are not always consistent or reproducible among different users. Here, we explore the use of unsupervised strategies for particle sorting that are compatible with the autonomous operation of the image processing pipeline. More specifically, we show that particles can be successfully sorted based on a simple statistical model for the distribution of scores assigned during refinement. This represents an important step towards the development of automated workflows for protein structure determination using single-particle cryo-EM.
△ Less
Submitted 22 October, 2019;
originally announced October 2019.
-
Image recovery from rotational and translational invariants
Authors:
Nicholas F. Marshall,
Ti-Yen Lan,
Tamir Bendory,
Amit Singer
Abstract:
We introduce a framework for recovering an image from its rotationally and translationally invariant features based on autocorrelation analysis. This work is an instance of the multi-target detection statistical model, which is mainly used to study the mathematical and computational properties of single-particle reconstruction using cryo-electron microscopy (cryo-EM) at low signal-to-noise ratios.…
▽ More
We introduce a framework for recovering an image from its rotationally and translationally invariant features based on autocorrelation analysis. This work is an instance of the multi-target detection statistical model, which is mainly used to study the mathematical and computational properties of single-particle reconstruction using cryo-electron microscopy (cryo-EM) at low signal-to-noise ratios. We demonstrate with synthetic numerical experiments that an image can be reconstructed from rotationally and translationally invariant features and show that the reconstruction is robust to noise. These results constitute an important step towards the goal of structure determination of small biomolecules using cryo-EM.
△ Less
Submitted 22 October, 2019;
originally announced October 2019.
-
Single-particle cryo-electron microscopy: Mathematical theory, computational challenges, and opportunities
Authors:
Tamir Bendory,
Alberto Bartesaghi,
Amit Singer
Abstract:
In recent years, an abundance of new molecular structures have been elucidated using cryo-electron microscopy (cryo-EM), largely due to advances in hardware technology and data processing techniques. Owing to these new exciting developments, cryo-EM was selected by Nature Methods as Method of the Year 2015, and the Nobel Prize in Chemistry 2017 was awarded to three pioneers in the field.
The mai…
▽ More
In recent years, an abundance of new molecular structures have been elucidated using cryo-electron microscopy (cryo-EM), largely due to advances in hardware technology and data processing techniques. Owing to these new exciting developments, cryo-EM was selected by Nature Methods as Method of the Year 2015, and the Nobel Prize in Chemistry 2017 was awarded to three pioneers in the field.
The main goal of this article is to introduce the challenging and exciting computational tasks involved in reconstructing 3-D molecular structures by cryo-EM. Determining molecular structures requires a wide range of computational tools in a variety of fields, including signal processing, estimation and detection theory, high-dimensional statistics, convex and non-convex optimization, spectral algorithms, dimensionality reduction, and machine learning. The tools from these fields must be adapted to work under exceptionally challenging conditions, including extreme noise levels, the presence of missing data, and massively large datasets as large as several Terabytes.
In addition, we present two statistical models: multi-reference alignment and multi-target detection, that abstract away much of the intricacies of cryo-EM, while retaining some of its essential features. Based on these abstractions, we discuss some recent intriguing results in the mathematical theory of cryo-EM, and delineate relations with group theory, invariant theory, and information theory.
△ Less
Submitted 7 October, 2019; v1 submitted 1 August, 2019;
originally announced August 2019.
-
Multi-target Detection with an Arbitrary Spacing Distribution
Authors:
Ti-Yen Lan,
Tamir Bendory,
Nicolas Boumal,
Amit Singer
Abstract:
Motivated by the structure reconstruction problem in single-particle cryo-electron microscopy, we consider the multi-target detection model, where multiple copies of a target signal occur at unknown locations in a long measurement, further corrupted by additive Gaussian noise. At low noise levels, one can easily detect the signal occurrences and estimate the signal by averaging. However, in the pr…
▽ More
Motivated by the structure reconstruction problem in single-particle cryo-electron microscopy, we consider the multi-target detection model, where multiple copies of a target signal occur at unknown locations in a long measurement, further corrupted by additive Gaussian noise. At low noise levels, one can easily detect the signal occurrences and estimate the signal by averaging. However, in the presence of high noise, which is the focus of this paper, detection is impossible. Here, we propose two approaches---autocorrelation analysis and an approximate expectation maximization algorithm---to reconstruct the signal without the need to detect signal occurrences in the measurement. In particular, our methods apply to an arbitrary spacing distribution of signal occurrences. We demonstrate reconstructions with synthetic data and empirically show that the sample complexity of both methods scales as 1/SNR^3 in the low SNR regime.
△ Less
Submitted 22 January, 2020; v1 submitted 8 May, 2019;
originally announced May 2019.
-
Multi-target detection with application to cryo-electron microscopy
Authors:
Tamir Bendory,
Nicolas Boumal,
William Leeb,
Eitan Levin,
Amit Singer
Abstract:
We consider the multi-target detection problem of recovering a set of signals that appear multiple times at unknown locations in a noisy measurement. In the low noise regime, one can estimate the signals by first detecting occurrences, then clustering and averaging them. In the high noise regime however, neither detection nor clustering can be performed reliably, so that strategies along these lin…
▽ More
We consider the multi-target detection problem of recovering a set of signals that appear multiple times at unknown locations in a noisy measurement. In the low noise regime, one can estimate the signals by first detecting occurrences, then clustering and averaging them. In the high noise regime however, neither detection nor clustering can be performed reliably, so that strategies along these lines are destined to fail. Notwithstanding, using autocorrelation analysis, we show that the impossibility to detect and cluster signal occurrences in the presence of high noise does not necessarily preclude signal estimation. Specifically, to estimate the signals, we derive simple relations between the autocorrelations of the observation and those of the signals. These autocorrelations can be estimated accurately at any noise level given a sufficiently long measurement. To recover the signals from the observed autocorrelations, we solve a set of polynomial equations through nonlinear least-squares. We provide analysis regarding well-posedness of the task, and demonstrate numerically the effectiveness of the method in a variety of settings.
The main goal of this work is to provide theoretical and numerical support for a recently proposed framework to image 3-D structures of biological macromolecules using cryo-electron microscopy in extreme noise levels.
△ Less
Submitted 3 June, 2019; v1 submitted 12 March, 2019;
originally announced March 2019.
-
Frequency-Resolved Optical Gating Recovery via Smoothing Gradient
Authors:
Samuel Pinilla,
Tamir Bendory,
Yonina C. Eldar,
Henry Arguello
Abstract:
Frequency-resolved optical gating (FROG) is a popular technique for complete characterization of ultrashort laser pulses. The acquired data in FROG, called FROG trace, is the Fourier magnitude of the product of the unknown pulse with a time-shifted version of itself, for several different shifts. To estimate the pulse from the FROG trace, we propose an algorithm that minimizes a smoothed non-conve…
▽ More
Frequency-resolved optical gating (FROG) is a popular technique for complete characterization of ultrashort laser pulses. The acquired data in FROG, called FROG trace, is the Fourier magnitude of the product of the unknown pulse with a time-shifted version of itself, for several different shifts. To estimate the pulse from the FROG trace, we propose an algorithm that minimizes a smoothed non-convex least-squares objective function. The method consists of two steps. First, we approximate the pulse by an iterative spectral algorithm. Then, the attained initialization is refined based upon a sequence of block stochastic gradient iterations. The algorithm is theoretically simple, numerically scalable, and easy-to-implement. Empirically, our approach outperforms the state-of-the-art when the FROG trace is incomplete, that is, when only few shifts are recorded. Simulations also suggest that the proposed algorithm exhibits similar computational cost compared to a state-of-the-art technique for both complete and incomplete data. In addition, we prove that in the vicinity of the true solution, the algorithm converges to a critical point. A Matlab implementation is publicly available at https://github.com/samuelpinilla/FROG.
△ Less
Submitted 8 September, 2019; v1 submitted 12 February, 2019;
originally announced February 2019.
-
Heterogeneous multireference alignment for images with application to 2-D classification in single particle reconstruction
Authors:
Chao Ma,
Tamir Bendory,
Nicolas Boumal,
Fred Sigworth,
Amit Singer
Abstract:
Motivated by the task of 2-D classification in single particle reconstruction by cryo-electron microscopy (cryo-EM), we consider the problem of heterogeneous multireference alignment of images. In this problem, the goal is to estimate a (typically small) set of target images from a (typically large) collection of observations. Each observation is a rotated, noisy version of one of the target image…
▽ More
Motivated by the task of 2-D classification in single particle reconstruction by cryo-electron microscopy (cryo-EM), we consider the problem of heterogeneous multireference alignment of images. In this problem, the goal is to estimate a (typically small) set of target images from a (typically large) collection of observations. Each observation is a rotated, noisy version of one of the target images. For each individual observation, neither the rotation nor which target image has been rotated are known. As the noise level in cryo-EM data is high, clustering the observations and estimating individual rotations is challenging. We propose a framework to estimate the target images directly from the observations, completely bypassing the need to cluster or register the images. The framework consists of two steps. First, we estimate rotation-invariant features of the images, such as the bispectrum. These features can be estimated to any desired accuracy, at any noise level, provided sufficiently many observations are collected. Then, we estimate the images from the invariant features. Numerical experiments on synthetic cryo-EM datasets demonstrate the effectiveness of the method. Ultimately, we outline future developments required to apply this method to experimental data.
△ Less
Submitted 1 October, 2019; v1 submitted 11 October, 2018;
originally announced November 2018.