-
Elucidating the representation of images within an unconditional diffusion model denoiser
Authors:
Zahra Kadkhodaie,
Stéphane Mallat,
Eero Simoncelli
Abstract:
Generative diffusion models learn probability densities over diverse image datasets by estimating the score with a neural network trained to remove noise. Despite their remarkable success in generating high-quality images, the internal mechanisms of the underlying score networks are not well understood. Here, we examine a UNet trained for denoising on the ImageNet dataset, to better understand its…
▽ More
Generative diffusion models learn probability densities over diverse image datasets by estimating the score with a neural network trained to remove noise. Despite their remarkable success in generating high-quality images, the internal mechanisms of the underlying score networks are not well understood. Here, we examine a UNet trained for denoising on the ImageNet dataset, to better understand its internal representation and computation of the score. We show that the middle block of the UNet decomposes individual images into sparse subsets of active channels, and that the vector of spatial averages of these channels can provide a nonlinear representation of the underlying clean images. We develop a novel algorithm for stochastic reconstruction of images from this representation and demonstrate that it recovers a sample from a set of images defined by a target image representation. We then study the properties of the representation and demonstrate that Euclidean distances in the latent space correspond to distances between conditional densities induced by representations as well as semantic similarities in the image space. Applying a clustering algorithm in the representation space yields groups of images that share both fine details (e.g., specialized features, textured regions, small objects), as well as global structure, but are only partially aligned with object identities. Thus, we show for the first time that a network trained solely on denoising contains a rich and accessible sparse representation of images.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
Feature-guided score diffusion for sampling conditional densities
Authors:
Zahra Kadkhodaie,
Stéphane Mallat,
Eero P. Simoncelli
Abstract:
Score diffusion methods can learn probability densities from samples. The score of the noise-corrupted density is estimated using a deep neural network, which is then used to iteratively transport a Gaussian white noise density to a target density. Variants for conditional densities have been developed, but correct estimation of the corresponding scores is difficult. We avoid these difficulties by…
▽ More
Score diffusion methods can learn probability densities from samples. The score of the noise-corrupted density is estimated using a deep neural network, which is then used to iteratively transport a Gaussian white noise density to a target density. Variants for conditional densities have been developed, but correct estimation of the corresponding scores is difficult. We avoid these difficulties by introducing an algorithm that guides the diffusion with a projected score. The projection pushes the image feature vector towards the feature vector centroid of the target class. The projected score and the feature vectors are learned by the same network. Specifically, the image feature vector is defined as the spatial averages of the channels activations in select layers of the network. Optimizing the projected score for denoising loss encourages image feature vectors of each class to cluster around their centroids. It also leads to the separations of the centroids. We show that these centroids provide a low-dimensional Euclidean embedding of the class conditional densities. We demonstrate that the algorithm can generate high quality and diverse samples from the conditioning class. Conditional generation can be performed using feature vectors interpolated between those of the training set, demonstrating out-of-distribution generalization.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
Hierarchic Flows to Estimate and Sample High-dimensional Probabilities
Authors:
Etienne Lempereur,
Stéphane Mallat
Abstract:
Finding low-dimensional interpretable models of complex physical fields such as turbulence remains an open question, 80 years after the pioneer work of Kolmogorov. Estimating high-dimensional probability distributions from data samples suffers from an optimization and an approximation curse of dimensionality. It may be avoided by following a hierarchic probability flow from coarse to fine scales.…
▽ More
Finding low-dimensional interpretable models of complex physical fields such as turbulence remains an open question, 80 years after the pioneer work of Kolmogorov. Estimating high-dimensional probability distributions from data samples suffers from an optimization and an approximation curse of dimensionality. It may be avoided by following a hierarchic probability flow from coarse to fine scales. This inverse renormalization group is defined by conditional probabilities across scales, renormalized in a wavelet basis. For a $\varphi^4$ scalar potential, sampling these hierarchic models avoids the critical slowing down at the phase transition. An outstanding issue is to also approximate non-Gaussian fields having long-range interactions in space and across scales. We introduce low-dimensional models with robust multiscale approximations of high order polynomial energies. They are calculated with a second wavelet transform, which defines interactions over two hierarchies of scales. We estimate and sample these wavelet scattering models to generate 2D vorticity fields of turbulence, and images of dark matter densities.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Generalization in diffusion models arises from geometry-adaptive harmonic representations
Authors:
Zahra Kadkhodaie,
Florentin Guth,
Eero P. Simoncelli,
Stéphane Mallat
Abstract:
Deep neural networks (DNNs) trained for image denoising are able to generate high-quality samples with score-based reverse diffusion algorithms. These impressive capabilities seem to imply an escape from the curse of dimensionality, but recent reports of memorization of the training set raise the question of whether these networks are learning the "true" continuous density of the data. Here, we sh…
▽ More
Deep neural networks (DNNs) trained for image denoising are able to generate high-quality samples with score-based reverse diffusion algorithms. These impressive capabilities seem to imply an escape from the curse of dimensionality, but recent reports of memorization of the training set raise the question of whether these networks are learning the "true" continuous density of the data. Here, we show that two DNNs trained on non-overlapping subsets of a dataset learn nearly the same score function, and thus the same density, when the number of training images is large enough. In this regime of strong generalization, diffusion-generated images are distinct from the training set, and are of high visual quality, suggesting that the inductive biases of the DNNs are well-aligned with the data density. We analyze the learned denoising functions and show that the inductive biases give rise to a shrinkage operation in a basis adapted to the underlying image. Examination of these bases reveals oscillating harmonic structures along contours and in homogeneous regions. We demonstrate that trained denoisers are inductively biased towards these geometry-adaptive harmonic bases since they arise not only when the network is trained on photographic images, but also when it is trained on image classes supported on low-dimensional manifolds for which the harmonic basis is suboptimal. Finally, we show that when trained on regular image classes for which the optimal basis is known to be geometry-adaptive and harmonic, the denoising performance of the networks is near-optimal.
△ Less
Submitted 12 April, 2024; v1 submitted 3 October, 2023;
originally announced October 2023.
-
Scattering Spectra Models for Physics
Authors:
Sihao Cheng,
Rudy Morel,
Erwan Allys,
Brice Ménard,
Stéphane Mallat
Abstract:
Physicists routinely need probabilistic models for a number of tasks such as parameter inference or the generation of new realizations of a field. Establishing such models for highly non-Gaussian fields is a challenge, especially when the number of samples is limited. In this paper, we introduce scattering spectra models for stationary fields and we show that they provide accurate and robust stati…
▽ More
Physicists routinely need probabilistic models for a number of tasks such as parameter inference or the generation of new realizations of a field. Establishing such models for highly non-Gaussian fields is a challenge, especially when the number of samples is limited. In this paper, we introduce scattering spectra models for stationary fields and we show that they provide accurate and robust statistical descriptions of a wide range of fields encountered in physics. These models are based on covariances of scattering coefficients, i.e. wavelet decomposition of a field coupled with a point-wise modulus. After introducing useful dimension reductions taking advantage of the regularity of a field under rotation and scaling, we validate these models on various multi-scale physical fields and demonstrate that they reproduce standard statistics, including spatial moments up to 4th order. These scattering spectra provide us with a low-dimensional structured representation that captures key properties encountered in a wide range of physical fields. These generic models can be used for data exploration, classification, parameter inference, symmetry detection, and component separation.
△ Less
Submitted 4 October, 2024; v1 submitted 29 June, 2023;
originally announced June 2023.
-
Conditionally Strongly Log-Concave Generative Models
Authors:
Florentin Guth,
Etienne Lempereur,
Joan Bruna,
Stéphane Mallat
Abstract:
There is a growing gap between the impressive results of deep image generative models and classical algorithms that offer theoretical guarantees. The former suffer from mode collapse or memorization issues, limiting their application to scientific data. The latter require restrictive assumptions such as log-concavity to escape the curse of dimensionality. We partially bridge this gap by introducin…
▽ More
There is a growing gap between the impressive results of deep image generative models and classical algorithms that offer theoretical guarantees. The former suffer from mode collapse or memorization issues, limiting their application to scientific data. The latter require restrictive assumptions such as log-concavity to escape the curse of dimensionality. We partially bridge this gap by introducing conditionally strongly log-concave (CSLC) models, which factorize the data distribution into a product of conditional probability distributions that are strongly log-concave. This factorization is obtained with orthogonal projectors adapted to the data distribution. It leads to efficient parameter estimation and sampling algorithms, with theoretical guarantees, although the data distribution is not globally log-concave. We show that several challenging multiscale processes are conditionally log-concave using wavelet packet orthogonal projectors. Numerical results are shown for physical fields such as the $\varphi^4$ model and weak lensing convergence maps with higher resolution than in previous works.
△ Less
Submitted 31 May, 2023;
originally announced June 2023.
-
A Rainbow in Deep Network Black Boxes
Authors:
Florentin Guth,
Brice Ménard,
Gaspar Rochette,
Stéphane Mallat
Abstract:
A central question in deep learning is to understand the functions learned by deep networks. What is their approximation class? Do the learned weights and representations depend on initialization? Previous empirical work has evidenced that kernels defined by network activations are similar across initializations. For shallow networks, this has been theoretically studied with random feature models,…
▽ More
A central question in deep learning is to understand the functions learned by deep networks. What is their approximation class? Do the learned weights and representations depend on initialization? Previous empirical work has evidenced that kernels defined by network activations are similar across initializations. For shallow networks, this has been theoretically studied with random feature models, but an extension to deep networks has remained elusive. Here, we provide a deep extension of such random feature models, which we call the rainbow model. We prove that rainbow networks define deterministic (hierarchical) kernels in the infinite-width limit. The resulting functions thus belong to a data-dependent RKHS which does not depend on the weight randomness. We also verify numerically our modeling assumptions on deep CNNs trained on image classification tasks, and show that the trained networks approximately satisfy the rainbow hypothesis. In particular, rainbow networks sampled from the corresponding random feature model achieve similar performance as the trained networks. Our results highlight the central role played by the covariances of network weights at each layer, which are observed to be low-rank as a result of feature learning.
△ Less
Submitted 24 October, 2024; v1 submitted 29 May, 2023;
originally announced May 2023.
-
Learning multi-scale local conditional probability models of images
Authors:
Zahra Kadkhodaie,
Florentin Guth,
Stéphane Mallat,
Eero P Simoncelli
Abstract:
Deep neural networks can learn powerful prior probability models for images, as evidenced by the high-quality generations obtained with recent score-based diffusion methods. But the means by which these networks capture complex global statistical structure, apparently without suffering from the curse of dimensionality, remain a mystery. To study this, we incorporate diffusion methods into a multi-…
▽ More
Deep neural networks can learn powerful prior probability models for images, as evidenced by the high-quality generations obtained with recent score-based diffusion methods. But the means by which these networks capture complex global statistical structure, apparently without suffering from the curse of dimensionality, remain a mystery. To study this, we incorporate diffusion methods into a multi-scale decomposition, reducing dimensionality by assuming a stationary local Markov model for wavelet coefficients conditioned on coarser-scale coefficients. We instantiate this model using convolutional neural networks (CNNs) with local receptive fields, which enforce both the stationarity and Markov properties. Global structures are captured using a CNN with receptive fields covering the entire (but small) low-pass image. We test this model on a dataset of face images, which are highly non-stationary and contain large-scale geometric structures. Remarkably, denoising, super-resolution, and image synthesis results all demonstrate that these structures can be captured with significantly smaller conditioning neighborhoods than required by a Markov model implemented in the pixel domain. Our results show that score estimation for large complex images can be reduced to low-dimensional Markov conditional models across scales, alleviating the curse of dimensionality.
△ Less
Submitted 6 March, 2023;
originally announced March 2023.
-
Wavelet Score-Based Generative Modeling
Authors:
Florentin Guth,
Simon Coste,
Valentin De Bortoli,
Stephane Mallat
Abstract:
Score-based generative models (SGMs) synthesize new data samples from Gaussian white noise by running a time-reversed Stochastic Differential Equation (SDE) whose drift coefficient depends on some probabilistic score. The discretization of such SDEs typically requires a large number of time steps and hence a high computational cost. This is because of ill-conditioning properties of the score that…
▽ More
Score-based generative models (SGMs) synthesize new data samples from Gaussian white noise by running a time-reversed Stochastic Differential Equation (SDE) whose drift coefficient depends on some probabilistic score. The discretization of such SDEs typically requires a large number of time steps and hence a high computational cost. This is because of ill-conditioning properties of the score that we analyze mathematically. We show that SGMs can be considerably accelerated, by factorizing the data distribution into a product of conditional probabilities of wavelet coefficients across scales. The resulting Wavelet Score-based Generative Model (WSGM) synthesizes wavelet coefficients with the same number of time steps at all scales, and its time complexity therefore grows linearly with the image size. This is proved mathematically over Gaussian distributions, and shown numerically over physical processes at phase transition and natural image datasets.
△ Less
Submitted 9 August, 2022;
originally announced August 2022.
-
Wavelet Conditional Renormalization Group
Authors:
Tanguy Marchand,
Misaki Ozawa,
Giulio Biroli,
Stéphane Mallat
Abstract:
We develop a multiscale approach to estimate high-dimensional probability distributions from a dataset of physical fields or configurations observed in experiments or simulations. In this way we can estimate energy functions (or Hamiltonians) and efficiently generate new samples of many-body systems in various domains, from statistical physics to cosmology. Our method -- the Wavelet Conditional Re…
▽ More
We develop a multiscale approach to estimate high-dimensional probability distributions from a dataset of physical fields or configurations observed in experiments or simulations. In this way we can estimate energy functions (or Hamiltonians) and efficiently generate new samples of many-body systems in various domains, from statistical physics to cosmology. Our method -- the Wavelet Conditional Renormalization Group (WC-RG) -- proceeds scale by scale, estimating models for the conditional probabilities of "fast degrees of freedom" conditioned by coarse-grained fields. These probability distributions are modeled by energy functions associated with scale interactions, and are represented in an orthogonal wavelet basis. WC-RG decomposes the microscopic energy function as a sum of interaction energies at all scales and can efficiently generate new samples by going from coarse to fine scales. Near phase transitions, it avoids the "critical slowing down" of direct estimation and sampling algorithms. This is explained theoretically by combining results from RG and wavelet theories, and verified numerically for the Gaussian and $\varphi^4$ field theories. We show that multiscale WC-RG energy-based models are more general than local potential models and can capture the physics of complex many-body interacting systems at all length scales. This is demonstrated for weak-gravitational-lensing fields reflecting dark matter distributions in cosmology, which include long-range interactions with long-tail probability distributions. WC-RG has a large number of potential applications in non-equilibrium systems, where the underlying distribution is not known {\it a priori}. Finally, we discuss the connection between WC-RG and deep network architectures.
△ Less
Submitted 11 July, 2022;
originally announced July 2022.
-
Scale Dependencies and Self-Similar Models with Wavelet Scattering Spectra
Authors:
Rudy Morel,
Gaspar Rochette,
Roberto Leonarduzzi,
Jean-Philippe Bouchaud,
Stéphane Mallat
Abstract:
We introduce the wavelet scattering spectra which provide non-Gaussian models of time-series having stationary increments. A complex wavelet transform computes signal variations at each scale. Dependencies across scales are captured by the joint correlation across time and scales of wavelet coefficients and their modulus. This correlation matrix is nearly diagonalized by a second wavelet transform…
▽ More
We introduce the wavelet scattering spectra which provide non-Gaussian models of time-series having stationary increments. A complex wavelet transform computes signal variations at each scale. Dependencies across scales are captured by the joint correlation across time and scales of wavelet coefficients and their modulus. This correlation matrix is nearly diagonalized by a second wavelet transform, which defines the scattering spectra. We show that this vector of moments characterizes a wide range of non-Gaussian properties of multi-scale processes. We prove that self-similar processes have scattering spectra which are scale invariant. This property can be tested statistically on a single realization and defines a class of wide-sense self-similar processes. We build maximum entropy models conditioned by scattering spectra coefficients, and generate new time-series with a microcanonical sampling algorithm. Applications are shown for highly non-Gaussian financial and turbulence time-series.
△ Less
Submitted 19 June, 2023; v1 submitted 19 April, 2022;
originally announced April 2022.
-
Generalized Rectifier Wavelet Covariance Models For Texture Synthesis
Authors:
Antoine Brochard,
Sixin Zhang,
Stéphane Mallat
Abstract:
State-of-the-art maximum entropy models for texture synthesis are built from statistics relying on image representations defined by convolutional neural networks (CNN). Such representations capture rich structures in texture images, outperforming wavelet-based representations in this regard. However, conversely to neural networks, wavelets offer meaningful representations, as they are known to det…
▽ More
State-of-the-art maximum entropy models for texture synthesis are built from statistics relying on image representations defined by convolutional neural networks (CNN). Such representations capture rich structures in texture images, outperforming wavelet-based representations in this regard. However, conversely to neural networks, wavelets offer meaningful representations, as they are known to detect structures at multiple scales (e.g. edges) in images. In this work, we propose a family of statistics built upon non-linear wavelet based representations, that can be viewed as a particular instance of a one-layer CNN, using a generalized rectifier non-linearity. These statistics significantly improve the visual quality of previous classical wavelet-based models, and allow one to produce syntheses of similar quality to state-of-the-art models, on both gray-scale and color textures.
△ Less
Submitted 14 March, 2022;
originally announced March 2022.
-
Phase Collapse in Neural Networks
Authors:
Florentin Guth,
John Zarka,
Stéphane Mallat
Abstract:
Deep convolutional classifiers linearly separate image classes and improve accuracy as depth increases. They progressively reduce the spatial dimension whereas the number of channels grows with depth. Spatial variability is therefore transformed into variability along channels. A fundamental challenge is to understand the role of non-linearities together with convolutional filters in this transfor…
▽ More
Deep convolutional classifiers linearly separate image classes and improve accuracy as depth increases. They progressively reduce the spatial dimension whereas the number of channels grows with depth. Spatial variability is therefore transformed into variability along channels. A fundamental challenge is to understand the role of non-linearities together with convolutional filters in this transformation. ReLUs with biases are often interpreted as thresholding operators that improve discrimination through sparsity. This paper demonstrates that it is a different mechanism called phase collapse which eliminates spatial variability while linearly separating classes. We show that collapsing the phases of complex wavelet coefficients is sufficient to reach the classification accuracy of ResNets of similar depths. However, replacing the phase collapses with thresholding operators that enforce sparsity considerably degrades the performance. We explain these numerical results by showing that the iteration of phase collapses progressively improves separation of classes, as opposed to thresholding non-linearities.
△ Less
Submitted 21 March, 2022; v1 submitted 11 October, 2021;
originally announced October 2021.
-
Separation and Concentration in Deep Networks
Authors:
John Zarka,
Florentin Guth,
Stéphane Mallat
Abstract:
Numerical experiments demonstrate that deep neural network classifiers progressively separate class distributions around their mean, achieving linear separability on the training set, and increasing the Fisher discriminant ratio. We explain this mechanism with two types of operators. We prove that a rectifier without biases applied to sign-invariant tight frames can separate class means and increa…
▽ More
Numerical experiments demonstrate that deep neural network classifiers progressively separate class distributions around their mean, achieving linear separability on the training set, and increasing the Fisher discriminant ratio. We explain this mechanism with two types of operators. We prove that a rectifier without biases applied to sign-invariant tight frames can separate class means and increase Fisher ratios. On the opposite, a soft-thresholding on tight frames can reduce within-class variabilities while preserving class means. Variance reduction bounds are proved for Gaussian mixture models. For image classification, we show that separation of class means can be achieved with rectified wavelet tight frames that are not learned. It defines a scattering transform. Learning $1 \times 1$ convolutional tight frames along scattering channels and applying a soft-thresholding reduces within-class variabilities. The resulting scattering network reaches the classification accuracy of ResNet-18 on CIFAR-10 and ImageNet, with fewer layers and no learned biases.
△ Less
Submitted 15 March, 2021; v1 submitted 18 December, 2020;
originally announced December 2020.
-
Particle gradient descent model for point process generation
Authors:
Antoine Brochard,
Bartłomiej Błaszczyszyn,
Stéphane Mallat,
Sixin Zhang
Abstract:
This paper presents a statistical model for stationary ergodic point processes, estimated from a single realization observed in a square window. With existing approaches in stochastic geometry, it is very difficult to model processes with complex geometries formed by a large number of particles. Inspired by recent works on gradient descent algorithms for sampling maximum-entropy models, we describ…
▽ More
This paper presents a statistical model for stationary ergodic point processes, estimated from a single realization observed in a square window. With existing approaches in stochastic geometry, it is very difficult to model processes with complex geometries formed by a large number of particles. Inspired by recent works on gradient descent algorithms for sampling maximum-entropy models, we describe a model that allows for fast sampling of new configurations reproducing the statistics of the given observation. Starting from an initial random configuration, its particles are moved according to the gradient of an energy, in order to match a set of prescribed moments (functionals). Our moments are defined via a phase harmonic operator on the wavelet transform of point patterns. They allow one to capture multi-scale interactions between the particles, while controlling explicitly the number of moments by the scales of the structures to model. We present numerical experiments on point processes with various geometric structures, and assess the quality of the model by spectral and topological data analysis.
△ Less
Submitted 15 September, 2022; v1 submitted 27 October, 2020;
originally announced October 2020.
-
Maximum Entropy Models from Phase Harmonic Covariances
Authors:
Sixin Zhang,
Stéphane Mallat
Abstract:
The covariance of a stationary process $X$ is diagonalized by a Fourier transform. It does not take into account the complex Fourier phase and defines Gaussian maximum entropy models. We introduce a general family of phase harmonic covariance moments, which rely on complex phases to capture non-Gaussian properties. They are defined as the covariance of $\hat{H} (L X)$, where $L$ is a complex linea…
▽ More
The covariance of a stationary process $X$ is diagonalized by a Fourier transform. It does not take into account the complex Fourier phase and defines Gaussian maximum entropy models. We introduce a general family of phase harmonic covariance moments, which rely on complex phases to capture non-Gaussian properties. They are defined as the covariance of $\hat{H} (L X)$, where $L$ is a complex linear operator and $\hat{H} $ is a non-linear phase harmonic operator which multiplies the phase of each complex coefficient by integers. The operator $\hat{H} (L X)$ can also be calculated from rectifiers, which relates $\hat{H} (L X)$ to neural network coefficients. If $L$ is a Fourier transform then the covariance is a sparse matrix whose non-zero off-diagonal coefficients capture dependencies between frequencies. These coefficients have similarities with high order moment, but smaller statistical variabilities because $\hat{H} (L X)$ is Lipschitz. If $L$ is a complex wavelet transform then off-diagonal coefficients reveal dependencies across scales, which specify the geometry of local coherent structures. We introduce maximum entropy models conditioned by these wavelet phase harmonic covariances. The precision of these models is numerically evaluated to synthesize images of turbulent flows and other stationary processes.
△ Less
Submitted 29 January, 2021; v1 submitted 22 November, 2019;
originally announced November 2019.
-
Deep Network Classification by Scattering and Homotopy Dictionary Learning
Authors:
John Zarka,
Louis Thiry,
Tomás Angles,
Stéphane Mallat
Abstract:
We introduce a sparse scattering deep convolutional neural network, which provides a simple model to analyze properties of deep representation learning for classification. Learning a single dictionary matrix with a classifier yields a higher classification accuracy than AlexNet over the ImageNet 2012 dataset. The network first applies a scattering transform that linearizes variabilities due to geo…
▽ More
We introduce a sparse scattering deep convolutional neural network, which provides a simple model to analyze properties of deep representation learning for classification. Learning a single dictionary matrix with a classifier yields a higher classification accuracy than AlexNet over the ImageNet 2012 dataset. The network first applies a scattering transform that linearizes variabilities due to geometric transformations such as translations and small deformations. A sparse $\ell^1$ dictionary coding reduces intra-class variability while preserving class separation through projections over unions of linear spaces. It is implemented in a deep convolutional network with a homotopy algorithm having an exponential convergence. A convergence proof is given in a general framework that includes ALISTA. Classification results are analyzed on ImageNet.
△ Less
Submitted 20 February, 2020; v1 submitted 8 October, 2019;
originally announced October 2019.
-
Kymatio: Scattering Transforms in Python
Authors:
Mathieu Andreux,
Tomás Angles,
Georgios Exarchakis,
Roberto Leonarduzzi,
Gaspar Rochette,
Louis Thiry,
John Zarka,
Stéphane Mallat,
Joakim andén,
Eugene Belilovsky,
Joan Bruna,
Vincent Lostanlen,
Muawiz Chaudhary,
Matthew J. Hirn,
Edouard Oyallon,
Sixin Zhang,
Carmine Cella,
Michael Eickenberg
Abstract:
The wavelet scattering transform is an invariant signal representation suitable for many signal processing and machine learning applications. We present the Kymatio software package, an easy-to-use, high-performance Python implementation of the scattering transform in 1D, 2D, and 3D that is compatible with modern deep learning frameworks. All transforms may be executed on a GPU (in addition to CPU…
▽ More
The wavelet scattering transform is an invariant signal representation suitable for many signal processing and machine learning applications. We present the Kymatio software package, an easy-to-use, high-performance Python implementation of the scattering transform in 1D, 2D, and 3D that is compatible with modern deep learning frameworks. All transforms may be executed on a GPU (in addition to CPU), offering a considerable speed up over CPU implementations. The package also has a small memory footprint, resulting inefficient memory usage. The source code, documentation, and examples are available undera BSD license at https://www.kymat.io/
△ Less
Submitted 31 May, 2022; v1 submitted 28 December, 2018;
originally announced December 2018.
-
Statistical learning of geometric characteristics of wireless networks
Authors:
Antoine Brochard,
Bartłomiej Błaszczyszyn,
Stéphane Mallat,
Sixin Zhang
Abstract:
Motivated by the prediction of cell loads in cellular networks, we formulate the following new, fundamental problem of statistical learning of geometric marks of point processes: An unknown marking function, depending on the geometry of point patterns, produces characteristics (marks) of the points. One aims at learning this function from the examples of marked point patterns in order to predict t…
▽ More
Motivated by the prediction of cell loads in cellular networks, we formulate the following new, fundamental problem of statistical learning of geometric marks of point processes: An unknown marking function, depending on the geometry of point patterns, produces characteristics (marks) of the points. One aims at learning this function from the examples of marked point patterns in order to predict the marks of new point patterns. To approximate (interpolate) the marking function, in our baseline approach, we build a statistical regression model of the marks with respect some local point distance representation. In a more advanced approach, we use a global data representation via the scattering moments of random measures, which build informative and stable to deformations data representation, already proven useful in image analysis and related application domains. In this case, the regression of the scattering moments of the marked point patterns with respect to the non-marked ones is combined with the numerical solution of the inverse problem, where the marks are recovered from the estimated scattering moments. Considering some simple, generic marks, often appearing in the modeling of wireless networks, such as the shot-noise values, nearest neighbour distance, and some characteristics of the Voronoi cells, we show that the scattering moments can capture similar geometry information as the baseline approach, and can reach even better performance, especially for non-local marking functions. Our results motivate further development of statistical learning tools for stochastic geometry and analysis of wireless networks, in particular to predict cell loads in cellular networks from the locations of base stations and traffic demand.
△ Less
Submitted 19 December, 2018;
originally announced December 2018.
-
Phase Harmonic Correlations and Convolutional Neural Networks
Authors:
Stéphane Mallat,
Sixin Zhang,
Gaspar Rochette
Abstract:
A major issue in harmonic analysis is to capture the phase dependence of frequency representations, which carries important signal properties. It seems that convolutional neural networks have found a way. Over time-series and images, convolutional networks often learn a first layer of filters which are well localized in the frequency domain, with different phases. We show that a rectifier then act…
▽ More
A major issue in harmonic analysis is to capture the phase dependence of frequency representations, which carries important signal properties. It seems that convolutional neural networks have found a way. Over time-series and images, convolutional networks often learn a first layer of filters which are well localized in the frequency domain, with different phases. We show that a rectifier then acts as a filter on the phase of the resulting coefficients. It computes signal descriptors which are local in space, frequency and phase. The non-linear phase filter becomes a multiplicative operator over phase harmonics computed with a Fourier transform along the phase. We prove that it defines a bi-Lipschitz and invertible representation. The correlations of phase harmonics coefficients characterise coherent structures from their phase dependence across frequencies. For wavelet filters, we show numerically that signals having sparse wavelet coefficients can be recovered from few phase harmonic correlations, which provide a compressive representation
△ Less
Submitted 29 June, 2019; v1 submitted 29 October, 2018;
originally announced October 2018.
-
Joint Time-Frequency Scattering
Authors:
Joakim Andén,
Vincent Lostanlen,
Stéphane Mallat
Abstract:
In time series classification and regression, signals are typically mapped into some intermediate representation used for constructing models. Since the underlying task is often insensitive to time shifts, these representations are required to be time-shift invariant. We introduce the joint time-frequency scattering transform, a time-shift invariant representation which characterizes the multiscal…
▽ More
In time series classification and regression, signals are typically mapped into some intermediate representation used for constructing models. Since the underlying task is often insensitive to time shifts, these representations are required to be time-shift invariant. We introduce the joint time-frequency scattering transform, a time-shift invariant representation which characterizes the multiscale energy distribution of a signal in time and frequency. It is computed through wavelet convolutions and modulus non-linearities and may therefore be implemented as a deep convolutional neural network whose filters are not learned but calculated from wavelets. We consider the progression from mel-spectrograms to time scattering and joint time-frequency scattering transforms, illustrating the relationship between increased discriminability and refinements of convolutional network architectures. The suitability of the joint time-frequency scattering transform for time-shift invariant characterization of time series is demonstrated through applications to chirp signals and audio synthesis experiments. The proposed transform also obtains state-of-the-art results on several audio classification tasks, outperforming time scattering transforms and achieving accuracies comparable to those of fully learned networks.
△ Less
Submitted 12 July, 2019; v1 submitted 23 July, 2018;
originally announced July 2018.
-
Generative networks as inverse problems with Scattering transforms
Authors:
Tomás Angles,
Stéphane Mallat
Abstract:
Generative Adversarial Nets (GANs) and Variational Auto-Encoders (VAEs) provide impressive image generations from Gaussian white noise, but the underlying mathematics are not well understood. We compute deep convolutional network generators by inverting a fixed embedding operator. Therefore, they do not require to be optimized with a discriminator or an encoder. The embedding is Lipschitz continuo…
▽ More
Generative Adversarial Nets (GANs) and Variational Auto-Encoders (VAEs) provide impressive image generations from Gaussian white noise, but the underlying mathematics are not well understood. We compute deep convolutional network generators by inverting a fixed embedding operator. Therefore, they do not require to be optimized with a discriminator or an encoder. The embedding is Lipschitz continuous to deformations so that generators transform linear interpolations between input white noise vectors into deformations between output images. This embedding is computed with a wavelet Scattering transform. Numerical experiments demonstrate that the resulting Scattering generators have similar properties as GANs or VAEs, without learning a discriminative network or an encoder.
△ Less
Submitted 17 May, 2018;
originally announced May 2018.
-
Solid Harmonic Wavelet Scattering for Predictions of Molecule Properties
Authors:
Michael Eickenberg,
Georgios Exarchakis,
Matthew Hirn,
Stéphane Mallat,
Louis Thiry
Abstract:
We present a machine learning algorithm for the prediction of molecule properties inspired by ideas from density functional theory. Using Gaussian-type orbital functions, we create surrogate electronic densities of the molecule from which we compute invariant "solid harmonic scattering coefficients" that account for different types of interactions at different scales. Multi-linear regressions of v…
▽ More
We present a machine learning algorithm for the prediction of molecule properties inspired by ideas from density functional theory. Using Gaussian-type orbital functions, we create surrogate electronic densities of the molecule from which we compute invariant "solid harmonic scattering coefficients" that account for different types of interactions at different scales. Multi-linear regressions of various physical properties of molecules are computed from these invariant coefficients. Numerical experiments show that these regressions have near state of the art performance, even with relatively few training examples. Predictions over small sets of scattering coefficients can reach a DFT precision while being interpretable.
△ Less
Submitted 1 May, 2018;
originally announced May 2018.
-
Multiscale Hierarchical Convolutional Networks
Authors:
Jörn-Henrik Jacobsen,
Edouard Oyallon,
Stéphane Mallat,
Arnold W. M. Smeulders
Abstract:
Deep neural network algorithms are difficult to analyze because they lack structure allowing to understand the properties of underlying transforms and invariants. Multiscale hierarchical convolutional networks are structured deep convolutional networks where layers are indexed by progressively higher dimensional attributes, which are learned from training data. Each new layer is computed with mult…
▽ More
Deep neural network algorithms are difficult to analyze because they lack structure allowing to understand the properties of underlying transforms and invariants. Multiscale hierarchical convolutional networks are structured deep convolutional networks where layers are indexed by progressively higher dimensional attributes, which are learned from training data. Each new layer is computed with multidimensional convolutions along spatial and attribute variables. We introduce an efficient implementation of such networks where the dimensionality is progressively reduced by averaging intermediate layers along attribute indices. Hierarchical networks are tested on CIFAR image data bases where they obtain comparable precisions to state of the art networks, with much fewer parameters. We study some properties of the attributes learned from these databases.
△ Less
Submitted 12 March, 2017;
originally announced March 2017.
-
Inverse Problems with Invariant Multiscale Statistics
Authors:
Ivan Dokmanić,
Joan Bruna,
Stéphane Mallat,
Maarten de Hoop
Abstract:
We propose a new approach to linear ill-posed inverse problems. Our algorithm alternates between enforcing two constraints: the measurements and the statistical correlation structure in some transformed space. We use a non-linear multiscale scattering transform which discards the phase and thus exposes strong spectral correlations otherwise hidden beneath the phase fluctuations. As a result, both…
▽ More
We propose a new approach to linear ill-posed inverse problems. Our algorithm alternates between enforcing two constraints: the measurements and the statistical correlation structure in some transformed space. We use a non-linear multiscale scattering transform which discards the phase and thus exposes strong spectral correlations otherwise hidden beneath the phase fluctuations. As a result, both constraints may be put into effect by linear projections in their respective spaces. We apply the algorithm to super-resolution and tomography and show that it outperforms ad hoc convex regularizers and stably recovers the missing spectrum.
△ Less
Submitted 2 December, 2018; v1 submitted 18 September, 2016;
originally announced September 2016.
-
Understanding Deep Convolutional Networks
Authors:
Stéphane Mallat
Abstract:
Deep convolutional networks provide state of the art classifications and regressions results over many high-dimensional problems. We review their architecture, which scatters data with a cascade of linear filter weights and non-linearities. A mathematical framework is introduced to analyze their properties. Computations of invariants involve multiscale contractions, the linearization of hierarchic…
▽ More
Deep convolutional networks provide state of the art classifications and regressions results over many high-dimensional problems. We review their architecture, which scatters data with a cascade of linear filter weights and non-linearities. A mathematical framework is introduced to analyze their properties. Computations of invariants involve multiscale contractions, the linearization of hierarchical symmetries, and sparse separations. Applications are discussed.
△ Less
Submitted 19 January, 2016;
originally announced January 2016.
-
Wavelet Scattering on the Pitch Spiral
Authors:
Vincent Lostanlen,
Stéphane Mallat
Abstract:
We present a new representation of harmonic sounds that linearizes the dynamics of pitch and spectral envelope, while remaining stable to deformations in the time-frequency plane. It is an instance of the scattering transform, a generic operator which cascades wavelet convolutions and modulus nonlinearities. It is derived from the pitch spiral, in that convolutions are successively performed in ti…
▽ More
We present a new representation of harmonic sounds that linearizes the dynamics of pitch and spectral envelope, while remaining stable to deformations in the time-frequency plane. It is an instance of the scattering transform, a generic operator which cascades wavelet convolutions and modulus nonlinearities. It is derived from the pitch spiral, in that convolutions are successively performed in time, log-frequency, and octave index. We give a closed-form approximation of spiral scattering coefficients for a nonstationary generalization of the harmonic source-filter model.
△ Less
Submitted 3 January, 2016;
originally announced January 2016.
-
Joint Time-Frequency Scattering for Audio Classification
Authors:
Joakim Andén,
Vincent Lostanlen,
Stéphane Mallat
Abstract:
We introduce the joint time-frequency scattering transform, a time shift invariant descriptor of time-frequency structure for audio classification. It is obtained by applying a two-dimensional wavelet transform in time and log-frequency to a time-frequency wavelet scalogram. We show that this descriptor successfully characterizes complex time-frequency phenomena such as time-varying filters and fr…
▽ More
We introduce the joint time-frequency scattering transform, a time shift invariant descriptor of time-frequency structure for audio classification. It is obtained by applying a two-dimensional wavelet transform in time and log-frequency to a time-frequency wavelet scalogram. We show that this descriptor successfully characterizes complex time-frequency phenomena such as time-varying filters and frequency modulated excitations. State-of-the-art results are achieved for signal reconstruction and phone segment classification on the TIMIT dataset.
△ Less
Submitted 7 December, 2015;
originally announced December 2015.
-
Deep Haar Scattering Networks
Authors:
Xiuyuan Cheng,
Xu Chen,
Stephane Mallat
Abstract:
An orthogonal Haar scattering transform is a deep network, computed with a hierarchy of additions, subtractions and absolute values, over pairs of coefficients. It provides a simple mathematical model for unsupervised deep network learning. It implements non-linear contractions, which are optimized for classification, with an unsupervised pair matching algorithm, of polynomial complexity. A struct…
▽ More
An orthogonal Haar scattering transform is a deep network, computed with a hierarchy of additions, subtractions and absolute values, over pairs of coefficients. It provides a simple mathematical model for unsupervised deep network learning. It implements non-linear contractions, which are optimized for classification, with an unsupervised pair matching algorithm, of polynomial complexity. A structured Haar scattering over graph data computes permutation invariant representations of groups of connected points in the graph. If the graph connectivity is unknown, unsupervised Haar pair learning can provide a consistent estimation of connected dyadic groups of points. Classification results are given on image data bases, defined on regular grids or graphs, with a connectivity which may be known or unknown.
△ Less
Submitted 30 September, 2015;
originally announced September 2015.
-
Transformée en scattering sur la spirale temps-chroma-octave
Authors:
Vincent Lostanlen,
Stéphane Mallat
Abstract:
We introduce a scattering representation for the analysis and classification of sounds. It is locally translation-invariant, stable to deformations in time and frequency, and has the ability to capture harmonic structures. The scattering representation can be interpreted as a convolutional neural network which cascades a wavelet transform in time and along a harmonic spiral. We study its applicati…
▽ More
We introduce a scattering representation for the analysis and classification of sounds. It is locally translation-invariant, stable to deformations in time and frequency, and has the ability to capture harmonic structures. The scattering representation can be interpreted as a convolutional neural network which cascades a wavelet transform in time and along a harmonic spiral. We study its application for the analysis of the deformations of the source-filter model.
△ Less
Submitted 1 September, 2015;
originally announced September 2015.
-
Quantum Energy Regression using Scattering Transforms
Authors:
Matthew Hirn,
Nicolas Poilvert,
Stéphane Mallat
Abstract:
We present a novel approach to the regression of quantum mechanical energies based on a scattering transform of an intermediate electron density representation. A scattering transform is a deep convolution network computed with a cascade of multiscale wavelet transforms. It possesses appropriate invariant and stability properties for quantum energy regression. This new framework removes fundamenta…
▽ More
We present a novel approach to the regression of quantum mechanical energies based on a scattering transform of an intermediate electron density representation. A scattering transform is a deep convolution network computed with a cascade of multiscale wavelet transforms. It possesses appropriate invariant and stability properties for quantum energy regression. This new framework removes fundamental limitations of Coulomb matrix based energy regressions, and numerical experiments give state-of-the-art accuracy over planar molecules.
△ Less
Submitted 20 May, 2016; v1 submitted 6 February, 2015;
originally announced February 2015.
-
Deep Roto-Translation Scattering for Object Classification
Authors:
Edouard Oyallon,
Stéphane Mallat
Abstract:
Dictionary learning algorithms or supervised deep convolution networks have considerably improved the efficiency of predefined feature representations such as SIFT. We introduce a deep scattering convolution network, with predefined wavelet filters over spatial and angular variables. This representation brings an important improvement to results previously obtained with predefined features over ob…
▽ More
Dictionary learning algorithms or supervised deep convolution networks have considerably improved the efficiency of predefined feature representations such as SIFT. We introduce a deep scattering convolution network, with predefined wavelet filters over spatial and angular variables. This representation brings an important improvement to results previously obtained with predefined features over object image databases such as Caltech and CIFAR. The resulting accuracy is comparable to results obtained with unsupervised deep learning and dictionary based representations. This shows that refining image representations by using geometric priors is a promising direction to improve image classification and its understanding.
△ Less
Submitted 30 May, 2015; v1 submitted 30 December, 2014;
originally announced December 2014.
-
Unsupervised Deep Haar Scattering on Graphs
Authors:
Xu Chen,
Xiuyuan Cheng,
Stéphane Mallat
Abstract:
The classification of high-dimensional data defined on graphs is particularly difficult when the graph geometry is unknown. We introduce a Haar scattering transform on graphs, which computes invariant signal descriptors. It is implemented with a deep cascade of additions, subtractions and absolute values, which iteratively compute orthogonal Haar wavelet transforms. Multiscale neighborhoods of unk…
▽ More
The classification of high-dimensional data defined on graphs is particularly difficult when the graph geometry is unknown. We introduce a Haar scattering transform on graphs, which computes invariant signal descriptors. It is implemented with a deep cascade of additions, subtractions and absolute values, which iteratively compute orthogonal Haar wavelet transforms. Multiscale neighborhoods of unknown graphs are estimated by minimizing an average total variation, with a pair matching algorithm of polynomial complexity. Supervised classification with dimension reduction is tested on data bases of scrambled images, and for signals sampled on unknown irregular grids on a sphere.
△ Less
Submitted 3 November, 2014; v1 submitted 9 June, 2014;
originally announced June 2014.
-
Phase retrieval for the Cauchy wavelet transform
Authors:
Stéphane Mallat,
Irène Waldspurger
Abstract:
We consider the phase retrieval problem in which one tries to reconstruct a function from the modulus of its wavelet transform. We study the unicity and stability of the reconstruction. In the case where the wavelets are Cauchy wavelets, we prove that the modulus of the wavelet transform uniquely determines the function up to a global phase. We show that the reconstruction operator is continuous b…
▽ More
We consider the phase retrieval problem in which one tries to reconstruct a function from the modulus of its wavelet transform. We study the unicity and stability of the reconstruction. In the case where the wavelets are Cauchy wavelets, we prove that the modulus of the wavelet transform uniquely determines the function up to a global phase. We show that the reconstruction operator is continuous but not uniformly continuous. We describe how to construct pairs of functions which are far away in $L^2$-norm but whose wavelet transforms are very close, in modulus. The principle is to modulate the wavelet transform of a fixed initial function by a phase which varies slowly in both time and frequency. This construction seems to cover all the instabilities that we observe in practice; we give a partial formal justification to this fact. Finally, we describe an exact reconstruction algorithm and use it to numerically confirm our analysis of the stability question.
△ Less
Submitted 3 April, 2017; v1 submitted 4 April, 2014;
originally announced April 2014.
-
Rigid-Motion Scattering for Texture Classification
Authors:
Laurent SIfre,
Stéphane Mallat
Abstract:
A rigid-motion scattering computes adaptive invariants along translations and rotations, with a deep convolutional network. Convolutions are calculated on the rigid-motion group, with wavelets defined on the translation and rotation variables. It preserves joint rotation and translation information, while providing global invariants at any desired scale. Texture classification is studied, through…
▽ More
A rigid-motion scattering computes adaptive invariants along translations and rotations, with a deep convolutional network. Convolutions are calculated on the rigid-motion group, with wavelets defined on the translation and rotation variables. It preserves joint rotation and translation information, while providing global invariants at any desired scale. Texture classification is studied, through the characterization of stationary processes from a single realization. State-of-the-art results are obtained on multiple texture data bases, with important rotation and scaling variabilities.
△ Less
Submitted 7 March, 2014;
originally announced March 2014.
-
Generic Deep Networks with Wavelet Scattering
Authors:
Edouard Oyallon,
Stéphane Mallat,
Laurent Sifre
Abstract:
We introduce a two-layer wavelet scattering network, for object classification. This scattering transform computes a spatial wavelet transform on the first layer and a new joint wavelet transform along spatial, angular and scale variables in the second layer. Numerical experiments demonstrate that this two layer convolution network, which involves no learning and no max pooling, performs efficient…
▽ More
We introduce a two-layer wavelet scattering network, for object classification. This scattering transform computes a spatial wavelet transform on the first layer and a new joint wavelet transform along spatial, angular and scale variables in the second layer. Numerical experiments demonstrate that this two layer convolution network, which involves no learning and no max pooling, performs efficiently on complex image data sets such as CalTech, with structural objects variability and clutter. It opens the possibility to simplify deep neural network learning by initializing the first layers with wavelet filters.
△ Less
Submitted 10 March, 2014; v1 submitted 20 December, 2013;
originally announced December 2013.
-
Audio Texture Synthesis with Scattering Moments
Authors:
Joan Bruna,
Stéphane Mallat
Abstract:
We introduce an audio texture synthesis algorithm based on scattering moments. A scattering transform is computed by iteratively decomposing a signal with complex wavelet filter banks and computing their amplitude envelop. Scattering moments provide general representations of stationary processes computed as expected values of scattering coefficients. They are estimated with low variance estimator…
▽ More
We introduce an audio texture synthesis algorithm based on scattering moments. A scattering transform is computed by iteratively decomposing a signal with complex wavelet filter banks and computing their amplitude envelop. Scattering moments provide general representations of stationary processes computed as expected values of scattering coefficients. They are estimated with low variance estimators from single realizations. Audio signals having prescribed scattering moments are synthesized with a gradient descent algorithms. Audio synthesis examples show that scattering representation provide good synthesis of audio textures with much fewer coefficients than the state of the art.
△ Less
Submitted 2 November, 2013;
originally announced November 2013.
-
Wavelet methods for shape perception in electro-sensing
Authors:
Habib Ammari,
Stéphane Mallat,
Irène Waldspurger,
Han Wang
Abstract:
This paper aims at presenting a new approach to the electro-sensing problem using wavelets. It provides an efficient algorithm for recognizing the shape of a target from micro-electrical impedance measurements. Stability and resolution capabilities of the proposed algorithm are quantified in numerical simulations.
This paper aims at presenting a new approach to the electro-sensing problem using wavelets. It provides an efficient algorithm for recognizing the shape of a target from micro-electrical impedance measurements. Stability and resolution capabilities of the proposed algorithm are quantified in numerical simulations.
△ Less
Submitted 10 October, 2013;
originally announced October 2013.
-
Deep Learning by Scattering
Authors:
Stéphane Mallat,
Irène Waldspurger
Abstract:
We introduce general scattering transforms as mathematical models of deep neural networks with l2 pooling. Scattering networks iteratively apply complex valued unitary operators, and the pooling is performed by a complex modulus. An expected scattering defines a contractive representation of a high-dimensional probability distribution, which preserves its mean-square norm. We show that unsupervise…
▽ More
We introduce general scattering transforms as mathematical models of deep neural networks with l2 pooling. Scattering networks iteratively apply complex valued unitary operators, and the pooling is performed by a complex modulus. An expected scattering defines a contractive representation of a high-dimensional probability distribution, which preserves its mean-square norm. We show that unsupervised learning can be casted as an optimization of the space contraction to preserve the volume occupied by unlabeled examples, at each layer of the network. Supervised learning and classification are performed with an averaged scattering, which provides scattering estimations for multiple classes.
△ Less
Submitted 25 June, 2015; v1 submitted 24 June, 2013;
originally announced June 2013.
-
Deep Scattering Spectrum
Authors:
Joakim Andén,
Stéphane Mallat
Abstract:
A scattering transform defines a locally translation invariant representation which is stable to time-warping deformations. It extends MFCC representations by computing modulation spectrum coefficients of multiple orders, through cascades of wavelet convolutions and modulus operators. Second-order scattering coefficients characterize transient phenomena such as attacks and amplitude modulation.…
▽ More
A scattering transform defines a locally translation invariant representation which is stable to time-warping deformations. It extends MFCC representations by computing modulation spectrum coefficients of multiple orders, through cascades of wavelet convolutions and modulus operators. Second-order scattering coefficients characterize transient phenomena such as attacks and amplitude modulation. A frequency transposition invariant representation is obtained by applying a scattering transform along log-frequency. State-the-of-art classification results are obtained for musical genre and phone classification on GTZAN and TIMIT databases, respectively.
△ Less
Submitted 10 January, 2014; v1 submitted 24 April, 2013;
originally announced April 2013.
-
Invariant Scattering Convolution Networks
Authors:
Joan Bruna,
Stéphane Mallat
Abstract:
A wavelet scattering network computes a translation invariant image representation, which is stable to deformations and preserves high frequency information for classification. It cascades wavelet transform convolutions with non-linear modulus and averaging operators. The first network layer outputs SIFT-type descriptors whereas the next layers provide complementary invariant information which imp…
▽ More
A wavelet scattering network computes a translation invariant image representation, which is stable to deformations and preserves high frequency information for classification. It cascades wavelet transform convolutions with non-linear modulus and averaging operators. The first network layer outputs SIFT-type descriptors whereas the next layers provide complementary invariant information which improves classification. The mathematical analysis of wavelet scattering networks explains important properties of deep convolution networks for classification.
A scattering representation of stationary processes incorporates higher order moments and can thus discriminate textures having the same Fourier power spectrum. State of the art classification results are obtained for handwritten digits and texture discrimination, using a Gaussian kernel SVM and a generative PCA classifier.
△ Less
Submitted 8 March, 2012; v1 submitted 5 March, 2012;
originally announced March 2012.
-
Classification with Invariant Scattering Representations
Authors:
Joan Bruna,
Stéphane Mallat
Abstract:
A scattering transform defines a signal representation which is invariant to translations and Lipschitz continuous relatively to deformations. It is implemented with a non-linear convolution network that iterates over wavelet and modulus operators. Lipschitz continuity locally linearizes deformations. Complex classes of signals and textures can be modeled with low-dimensional affine spaces, comput…
▽ More
A scattering transform defines a signal representation which is invariant to translations and Lipschitz continuous relatively to deformations. It is implemented with a non-linear convolution network that iterates over wavelet and modulus operators. Lipschitz continuity locally linearizes deformations. Complex classes of signals and textures can be modeled with low-dimensional affine spaces, computed with a PCA in the scattering domain. Classification is performed with a penalized model selection. State of the art results are obtained for handwritten digit recognition over small training sets, and for texture classification.
△ Less
Submitted 5 December, 2011;
originally announced December 2011.
-
Geometric Models with Co-occurrence Groups
Authors:
Joan Bruna,
Stéphane Mallat
Abstract:
A geometric model of sparse signal representations is introduced for classes of signals. It is computed by optimizing co-occurrence groups with a maximum likelihood estimate calculated with a Bernoulli mixture model. Applications to face image compression and MNIST digit classification illustrate the applicability of this model.
A geometric model of sparse signal representations is introduced for classes of signals. It is computed by optimizing co-occurrence groups with a maximum likelihood estimate calculated with a Bernoulli mixture model. Applications to face image compression and MNIST digit classification illustrate the applicability of this model.
△ Less
Submitted 30 January, 2011;
originally announced January 2011.
-
Group Invariant Scattering
Authors:
Stéphane Mallat
Abstract:
This paper constructs translation invariant operators on L2(R^d), which are Lipschitz continuous to the action of diffeomorphisms. A scattering propagator is a path ordered product of non-linear and non-commuting operators, each of which computes the modulus of a wavelet transform. A local integration defines a windowed scattering transform, which is proved to be Lipschitz continuous to the action…
▽ More
This paper constructs translation invariant operators on L2(R^d), which are Lipschitz continuous to the action of diffeomorphisms. A scattering propagator is a path ordered product of non-linear and non-commuting operators, each of which computes the modulus of a wavelet transform. A local integration defines a windowed scattering transform, which is proved to be Lipschitz continuous to the action of diffeomorphisms. As the window size increases, it converges to a wavelet scattering transform which is translation invariant. Scattering coefficients also provide representations of stationary processes. Expected values depend upon high order moments and can discriminate processes having the same power spectrum. Scattering operators are extended on L2 (G), where G is a compact Lie group, and are invariant under the action of G. Combining a scattering on L2(R^d) and on Ld (SO(d)) defines a translation and rotation invariant scattering on L2(R^d).
△ Less
Submitted 15 April, 2012; v1 submitted 12 January, 2011;
originally announced January 2011.
-
Classification with Scattering Operators
Authors:
Joan Bruna,
Stéphane Mallat
Abstract:
A scattering vector is a local descriptor including multiscale and multi-direction co-occurrence information. It is computed with a cascade of wavelet decompositions and complex modulus. This scattering representation is locally translation invariant and linearizes deformations. A supervised classification algorithm is computed with a PCA model selection on scattering vectors. State of the art res…
▽ More
A scattering vector is a local descriptor including multiscale and multi-direction co-occurrence information. It is computed with a cascade of wavelet decompositions and complex modulus. This scattering representation is locally translation invariant and linearizes deformations. A supervised classification algorithm is computed with a PCA model selection on scattering vectors. State of the art results are obtained for handwritten digit recognition and texture classification.
△ Less
Submitted 20 November, 2013; v1 submitted 12 November, 2010;
originally announced November 2010.
-
Solving Inverse Problems with Piecewise Linear Estimators: From Gaussian Mixture Models to Structured Sparsity
Authors:
Guoshen Yu,
Guillermo Sapiro,
Stéphane Mallat
Abstract:
A general framework for solving image inverse problems is introduced in this paper. The approach is based on Gaussian mixture models, estimated via a computationally efficient MAP-EM algorithm. A dual mathematical interpretation of the proposed framework with structured sparse estimation is described, which shows that the resulting piecewise linear estimate stabilizes the estimation when compared…
▽ More
A general framework for solving image inverse problems is introduced in this paper. The approach is based on Gaussian mixture models, estimated via a computationally efficient MAP-EM algorithm. A dual mathematical interpretation of the proposed framework with structured sparse estimation is described, which shows that the resulting piecewise linear estimate stabilizes the estimation when compared to traditional sparse inverse problem techniques. This interpretation also suggests an effective dictionary motivated initialization for the MAP-EM algorithm. We demonstrate that in a number of image inverse problems, including inpainting, zooming, and deblurring, the same algorithm produces either equal, often significantly better, or very small margin worse results than the best published ones, at a lower computational cost.
△ Less
Submitted 15 June, 2010;
originally announced June 2010.