-
A majorization-minimization algorithm for nonnegative binary matrix factorization
Authors:
Paul Magron,
Cédric Févotte
Abstract:
This paper tackles the problem of decomposing binary data using matrix factorization. We consider the family of mean-parametrized Bernoulli models, a class of generative models that are well suited for modeling binary data and enables interpretability of the factors. We factorize the Bernoulli parameter and consider an additional Beta prior on one of the factors to further improve the model's expr…
▽ More
This paper tackles the problem of decomposing binary data using matrix factorization. We consider the family of mean-parametrized Bernoulli models, a class of generative models that are well suited for modeling binary data and enables interpretability of the factors. We factorize the Bernoulli parameter and consider an additional Beta prior on one of the factors to further improve the model's expressive power. While similar models have been proposed in the literature, they only exploit the Beta prior as a proxy to ensure a valid Bernoulli parameter in a Bayesian setting; in practice it reduces to a uniform or uninformative prior. Besides, estimation in these models has focused on costly Bayesian inference. In this paper, we propose a simple yet very efficient majorization-minimization algorithm for maximum a posteriori estimation. Our approach leverages the Beta prior whose parameters can be tuned to improve performance in matrix completion tasks. Experiments conducted on three public binary datasets show that our approach offers an excellent trade-off between prediction performance, computational complexity, and interpretability.
△ Less
Submitted 20 April, 2022;
originally announced April 2022.
-
Accelerating Non-Negative and Bounded-Variable Linear Regression Algorithms with Safe Screening
Authors:
Cassio F. Dantas,
Emmanuel Soubies,
Cédric Févotte
Abstract:
Non-negative and bounded-variable linear regression problems arise in a variety of applications in machine learning and signal processing. In this paper, we propose a technique to accelerate existing solvers for these problems by identifying saturated coordinates in the course of iterations. This is akin to safe screening techniques previously proposed for sparsity-regularized regression problems.…
▽ More
Non-negative and bounded-variable linear regression problems arise in a variety of applications in machine learning and signal processing. In this paper, we propose a technique to accelerate existing solvers for these problems by identifying saturated coordinates in the course of iterations. This is akin to safe screening techniques previously proposed for sparsity-regularized regression problems. The proposed strategy is provably safe as it provides theoretical guarantees that the identified coordinates are indeed saturated in the optimal solution. Experimental results on synthetic and real data show compelling accelerations for both non-negative and bounded-variable problems.
△ Less
Submitted 26 June, 2023; v1 submitted 15 February, 2022;
originally announced February 2022.
-
Joint Majorization-Minimization for Nonnegative Matrix Factorization with the $β$-divergence
Authors:
Arthur Marmin,
José Henrique de Morais Goulart,
Cédric Févotte
Abstract:
This article proposes new multiplicative updates for nonnegative matrix factorization (NMF) with the $β$-divergence objective function. Our new updates are derived from a joint majorization-minimization (MM) scheme, in which an auxiliary function (a tight upper bound of the objective function) is built for the two factors jointly and minimized at each iteration. This is in contrast with the classi…
▽ More
This article proposes new multiplicative updates for nonnegative matrix factorization (NMF) with the $β$-divergence objective function. Our new updates are derived from a joint majorization-minimization (MM) scheme, in which an auxiliary function (a tight upper bound of the objective function) is built for the two factors jointly and minimized at each iteration. This is in contrast with the classic approach in which a majorizer is derived for each factor separately. Like that classic approach, our joint MM algorithm also results in multiplicative updates that are simple to implement. They however yield a significant drop of computation time (for equally good solutions), in particular for some $β$-divergences of important applicative interest, such as the squared Euclidean distance and the Kullback-Leibler or Itakura-Saito divergences. We report experimental results using diverse datasets: face images, an audio spectrogram, hyperspectral data and song play counts. Depending on the value of $β$ and on the dataset, our joint MM approach can yield CPU time reductions from about $13\%$ to $78\%$ in comparison to the classic alternating scheme.
△ Less
Submitted 17 April, 2023; v1 submitted 29 June, 2021;
originally announced June 2021.
-
Unbalanced Optimal Transport through Non-negative Penalized Linear Regression
Authors:
Laetitia Chapel,
Rémi Flamary,
Haoran Wu,
Cédric Févotte,
Gilles Gasso
Abstract:
This paper addresses the problem of Unbalanced Optimal Transport (UOT) in which the marginal conditions are relaxed (using weighted penalties in lieu of equality) and no additional regularization is enforced on the OT plan. In this context, we show that the corresponding optimization problem can be reformulated as a non-negative penalized linear regression problem. This reformulation allows us to…
▽ More
This paper addresses the problem of Unbalanced Optimal Transport (UOT) in which the marginal conditions are relaxed (using weighted penalties in lieu of equality) and no additional regularization is enforced on the OT plan. In this context, we show that the corresponding optimization problem can be reformulated as a non-negative penalized linear regression problem. This reformulation allows us to propose novel algorithms inspired from inverse problems and nonnegative matrix factorization. In particular, we consider majorization-minimization which leads in our setting to efficient multiplicative updates for a variety of penalties. Furthermore, we derive for the first time an efficient algorithm to compute the regularization path of UOT with quadratic penalties. The proposed algorithm provides a continuity of piece-wise linear OT plans converging to the solution of balanced OT (corresponding to infinite penalty weights). We perform several numerical experiments on simulated and real data illustrating the new algorithms, and provide a detailed discussion about more sophisticated optimization tools that can further be used to solve OT problems thanks to our reformulation.
△ Less
Submitted 8 June, 2021;
originally announced June 2021.
-
Positive Semidefinite Matrix Factorization: A Connection with Phase Retrieval and Affine Rank Minimization
Authors:
Dana Lahat,
Yanbin Lang,
Vincent Y. F. Tan,
Cédric Févotte
Abstract:
Positive semidefinite matrix factorization (PSDMF) expresses each entry of a nonnegative matrix as the inner product of two positive semidefinite (psd) matrices. When all these psd matrices are constrained to be diagonal, this model is equivalent to nonnegative matrix factorization. Applications include combinatorial optimization, quantum-based statistical models, and recommender systems, among ot…
▽ More
Positive semidefinite matrix factorization (PSDMF) expresses each entry of a nonnegative matrix as the inner product of two positive semidefinite (psd) matrices. When all these psd matrices are constrained to be diagonal, this model is equivalent to nonnegative matrix factorization. Applications include combinatorial optimization, quantum-based statistical models, and recommender systems, among others. However, despite the increasing interest in PSDMF, only a few PSDMF algorithms were proposed in the literature. In this work, we provide a collection of tools for PSDMF, by showing that PSDMF algorithms can be designed based on phase retrieval (PR) and affine rank minimization (ARM) algorithms. This procedure allows a shortcut in designing new PSDMF algorithms, as it allows to leverage some of the useful numerical properties of existing PR and ARM methods to the PSDMF framework. Motivated by this idea, we introduce a new family of PSDMF algorithms based on iterative hard thresholding (IHT). This family subsumes previously-proposed projected gradient PSDMF methods. We show that there is high variability among PSDMF optimization problems that makes it beneficial to try a number of methods based on different principles to tackle difficult problems. In certain cases, our proposed methods are the only algorithms able to find a solution. In certain other cases, they converge faster. Our results support our claim that the PSDMF framework can inherit desired numerical properties from PR and ARM algorithms, leading to more efficient PSDMF algorithms, and motivate further study of the links between these models.
△ Less
Submitted 2 April, 2021; v1 submitted 24 July, 2020;
originally announced July 2020.
-
A Comparative Study of Gamma Markov Chains for Temporal Non-Negative Matrix Factorization
Authors:
Louis Filstroff,
Olivier Gouvert,
Cédric Févotte,
Olivier Cappé
Abstract:
Non-negative matrix factorization (NMF) has become a well-established class of methods for the analysis of non-negative data. In particular, a lot of effort has been devoted to probabilistic NMF, namely estimation or inference tasks in probabilistic models describing the data, based for example on Poisson or exponential likelihoods. When dealing with time series data, several works have proposed t…
▽ More
Non-negative matrix factorization (NMF) has become a well-established class of methods for the analysis of non-negative data. In particular, a lot of effort has been devoted to probabilistic NMF, namely estimation or inference tasks in probabilistic models describing the data, based for example on Poisson or exponential likelihoods. When dealing with time series data, several works have proposed to model the evolution of the activation coefficients as a non-negative Markov chain, most of the time in relation with the Gamma distribution, giving rise to so-called temporal NMF models. In this paper, we review four Gamma Markov chains of the NMF literature, and show that they all share the same drawback: the absence of a well-defined stationary distribution. We then introduce a fifth process, an overlooked model of the time series literature named BGAR(1), which overcomes this limitation. These temporal NMF models are then compared in a MAP framework on a prediction task, in the context of the Poisson likelihood.
△ Less
Submitted 25 February, 2021; v1 submitted 23 June, 2020;
originally announced June 2020.
-
Ordinal Non-negative Matrix Factorization for Recommendation
Authors:
Olivier Gouvert,
Thomas Oberlin,
Cédric Févotte
Abstract:
We introduce a new non-negative matrix factorization (NMF) method for ordinal data, called OrdNMF. Ordinal data are categorical data which exhibit a natural ordering between the categories. In particular, they can be found in recommender systems, either with explicit data (such as ratings) or implicit data (such as quantized play counts). OrdNMF is a probabilistic latent factor model that generali…
▽ More
We introduce a new non-negative matrix factorization (NMF) method for ordinal data, called OrdNMF. Ordinal data are categorical data which exhibit a natural ordering between the categories. In particular, they can be found in recommender systems, either with explicit data (such as ratings) or implicit data (such as quantized play counts). OrdNMF is a probabilistic latent factor model that generalizes Bernoulli-Poisson factorization (BePoF) and Poisson factorization (PF) applied to binarized data. Contrary to these methods, OrdNMF circumvents binarization and can exploit a more informative representation of the data. We design an efficient variational algorithm based on a suitable model augmentation and related to variational PF. In particular, our algorithm preserves the scalability of PF and can be applied to huge sparse datasets. We report recommendation experiments on explicit and implicit datasets, and show that OrdNMF outperforms BePoF and PF applied to binarized data.
△ Less
Submitted 2 September, 2020; v1 submitted 1 June, 2020;
originally announced June 2020.
-
Recommendation from Raw Data with Adaptive Compound Poisson Factorization
Authors:
Olivier Gouvert,
Thomas Oberlin,
Cédric Févotte
Abstract:
Count data are often used in recommender systems: they are widespread (song play counts, product purchases, clicks on web pages) and can reveal user preference without any explicit rating from the user. Such data are known to be sparse, over-dispersed and bursty, which makes their direct use in recommender systems challenging, often leading to pre-processing steps such as binarization. The aim of…
▽ More
Count data are often used in recommender systems: they are widespread (song play counts, product purchases, clicks on web pages) and can reveal user preference without any explicit rating from the user. Such data are known to be sparse, over-dispersed and bursty, which makes their direct use in recommender systems challenging, often leading to pre-processing steps such as binarization. The aim of this paper is to build recommender systems from these raw data, by means of the recently proposed compound Poisson Factorization (cPF). The paper contributions are three-fold: we present a unified framework for discrete data (dcPF), leading to an adaptive and scalable algorithm; we show that our framework achieves a trade-off between Poisson Factorization (PF) applied to raw and binarized data; we study four specific instances that are relevant to recommendation and exhibit new links with combinatorics. Experiments with three different datasets show that dcPF is able to effectively adjust to over-dispersion, leading to better recommendation scores when compared with PF on either raw or binarized data.
△ Less
Submitted 9 July, 2019; v1 submitted 20 May, 2019;
originally announced May 2019.
-
An Inertial Newton Algorithm for Deep Learning
Authors:
Camille Castera,
Jérôme Bolte,
Cédric Févotte,
Edouard Pauwels
Abstract:
We introduce a new second-order inertial optimization method for machine learning called INNA. It exploits the geometry of the loss function while only requiring stochastic approximations of the function values and the generalized gradients. This makes INNA fully implementable and adapted to large-scale optimization problems such as the training of deep neural networks. The algorithm combines both…
▽ More
We introduce a new second-order inertial optimization method for machine learning called INNA. It exploits the geometry of the loss function while only requiring stochastic approximations of the function values and the generalized gradients. This makes INNA fully implementable and adapted to large-scale optimization problems such as the training of deep neural networks. The algorithm combines both gradient-descent and Newton-like behaviors as well as inertia. We prove the convergence of INNA for most deep learning problems. To do so, we provide a well-suited framework to analyze deep learning loss functions involving tame optimization in which we study a continuous dynamical system together with its discrete stochastic approximations. We prove sublinear convergence for the continuous-time differential inclusion which underlies our algorithm. Additionally, we also show how standard optimization mini-batch methods applied to non-smooth non-convex problems can yield a certain type of spurious stationary points never discussed before. We address this issue by providing a theoretical framework around the new idea of $D$-criticality; we then give a simple asymptotic analysis of INNA. Our algorithm allows for using an aggressive learning rate of $o(1/\log k)$. From an empirical viewpoint, we show that INNA returns competitive results with respect to state of the art (stochastic gradient descent, ADAGRAD, ADAM) on popular deep learning benchmark problems.
△ Less
Submitted 28 July, 2021; v1 submitted 29 May, 2019;
originally announced May 2019.
-
A Ranking Model Motivated by Nonnegative Matrix Factorization with Applications to Tennis Tournaments
Authors:
Rui Xia,
Vincent Y. F. Tan,
Louis Filstroff,
Cédric Févotte
Abstract:
We propose a novel ranking model that combines the Bradley-Terry-Luce probability model with a nonnegative matrix factorization framework to model and uncover the presence of latent variables that influence the performance of top tennis players. We derive an efficient, provably convergent, and numerically stable majorization-minimization-based algorithm to maximize the likelihood of datasets under…
▽ More
We propose a novel ranking model that combines the Bradley-Terry-Luce probability model with a nonnegative matrix factorization framework to model and uncover the presence of latent variables that influence the performance of top tennis players. We derive an efficient, provably convergent, and numerically stable majorization-minimization-based algorithm to maximize the likelihood of datasets under the proposed statistical model. The model is tested on datasets involving the outcomes of matches between 20 top male and female tennis players over 14 major tournaments for men (including the Grand Slams and the ATP Masters 1000) and 16 major tournaments for women over the past 10 years. Our model automatically infers that the surface of the court (e.g., clay or hard court) is a key determinant of the performances of male players, but less so for females. Top players on various surfaces over this longitudinal period are also identified in an objective manner.
△ Less
Submitted 12 June, 2019; v1 submitted 15 March, 2019;
originally announced March 2019.
-
Bayesian Mean-parameterized Nonnegative Binary Matrix Factorization
Authors:
Alberto Lumbreras,
Louis Filstroff,
Cédric Févotte
Abstract:
Binary data matrices can represent many types of data such as social networks, votes, or gene expression. In some cases, the analysis of binary matrices can be tackled with nonnegative matrix factorization (NMF), where the observed data matrix is approximated by the product of two smaller nonnegative matrices. In this context, probabilistic NMF assumes a generative model where the data is usually…
▽ More
Binary data matrices can represent many types of data such as social networks, votes, or gene expression. In some cases, the analysis of binary matrices can be tackled with nonnegative matrix factorization (NMF), where the observed data matrix is approximated by the product of two smaller nonnegative matrices. In this context, probabilistic NMF assumes a generative model where the data is usually Bernoulli-distributed. Often, a link function is used to map the factorization to the $[0,1]$ range, ensuring a valid Bernoulli mean parameter. However, link functions have the potential disadvantage to lead to uninterpretable models. Mean-parameterized NMF, on the contrary, overcomes this problem. We propose a unified framework for Bayesian mean-parameterized nonnegative binary matrix factorization models (NBMF). We analyze three models which correspond to three possible constraints that respect the mean-parametrization without the need for link functions. Furthermore, we derive a novel collapsed Gibbs sampler and a collapsed variational algorithm to infer the posterior distribution of the factors. Next, we extend the proposed models to a nonparametric setting where the number of used latent dimensions is automatically driven by the observed data. We analyze the performance of our NBMF methods in multiple datasets for different tasks such as dictionary learning and prediction of missing data. Experiments show that our methods provide similar or superior results than the state of the art, while automatically detecting the number of relevant components.
△ Less
Submitted 20 June, 2020; v1 submitted 17 December, 2018;
originally announced December 2018.
-
A Quasi-Newton algorithm on the orthogonal manifold for NMF with transform learning
Authors:
Pierre Ablin,
Dylan Fagot,
Herwig Wendt,
Alexandre Gramfort,
Cédric Févotte
Abstract:
Nonnegative matrix factorization (NMF) is a popular method for audio spectral unmixing. While NMF is traditionally applied to off-the-shelf time-frequency representations based on the short-time Fourier or Cosine transforms, the ability to learn transforms from raw data attracts increasing attention. However, this adds an important computational overhead. When assumed orthogonal (like the Fourier…
▽ More
Nonnegative matrix factorization (NMF) is a popular method for audio spectral unmixing. While NMF is traditionally applied to off-the-shelf time-frequency representations based on the short-time Fourier or Cosine transforms, the ability to learn transforms from raw data attracts increasing attention. However, this adds an important computational overhead. When assumed orthogonal (like the Fourier or Cosine transforms), learning the transform yields a non-convex optimization problem on the orthogonal matrix manifold. In this paper, we derive a quasi-Newton method on the manifold using sparse approximations of the Hessian. Experiments on synthetic and real audio data show that the proposed algorithm out-performs state-of-the-art first-order and coordinate-descent methods by orders of magnitude. A Python package for fast TL-NMF is released online at https://github.com/pierreablin/tlnmf.
△ Less
Submitted 6 November, 2018;
originally announced November 2018.
-
Factor analysis of dynamic PET images: beyond Gaussian noise
Authors:
Yanna Cruz Cavalcanti,
Thomas Oberlin,
Nicolas Dobigeon,
Cédric Févotte,
Simon Stute,
Maria-Joao Ribeiro,
Clovis Tauber
Abstract:
Factor analysis has proven to be a relevant tool for extracting tissue time-activity curves (TACs) in dynamic PET images, since it allows for an unsupervised analysis of the data. Reliable and interpretable results are possible only if considered with respect to suitable noise statistics. However, the noise in reconstructed dynamic PET images is very difficult to characterize, despite the Poissoni…
▽ More
Factor analysis has proven to be a relevant tool for extracting tissue time-activity curves (TACs) in dynamic PET images, since it allows for an unsupervised analysis of the data. Reliable and interpretable results are possible only if considered with respect to suitable noise statistics. However, the noise in reconstructed dynamic PET images is very difficult to characterize, despite the Poissonian nature of the count-rates. Rather than explicitly modeling the noise distribution, this work proposes to study the relevance of several divergence measures to be used within a factor analysis framework. To this end, the $β$-divergence, widely used in other applicative domains, is considered to design the data-fitting term involved in three different factor models. The performances of the resulting algorithms are evaluated for different values of $β$, in a range covering Gaussian, Poissonian and Gamma-distributed noises. The results obtained on two different types of synthetic images and one real image show the interest of applying non-standard values of $β$ to improve factor analysis.
△ Less
Submitted 26 March, 2019; v1 submitted 30 July, 2018;
originally announced July 2018.
-
Closed-form Marginal Likelihood in Gamma-Poisson Matrix Factorization
Authors:
Louis Filstroff,
Alberto Lumbreras,
Cédric Févotte
Abstract:
We present novel understandings of the Gamma-Poisson (GaP) model, a probabilistic matrix factorization model for count data. We show that GaP can be rewritten free of the score/activation matrix. This gives us new insights about the estimation of the topic/dictionary matrix by maximum marginal likelihood estimation. In particular, this explains the robustness of this estimator to over-specified va…
▽ More
We present novel understandings of the Gamma-Poisson (GaP) model, a probabilistic matrix factorization model for count data. We show that GaP can be rewritten free of the score/activation matrix. This gives us new insights about the estimation of the topic/dictionary matrix by maximum marginal likelihood estimation. In particular, this explains the robustness of this estimator to over-specified values of the factorization rank, especially its ability to automatically prune irrelevant dictionary columns, as empirically observed in previous work. The marginalization of the activation matrix leads in turn to a new Monte Carlo Expectation-Maximization algorithm with favorable properties.
△ Less
Submitted 31 May, 2018; v1 submitted 5 January, 2018;
originally announced January 2018.
-
Negative Binomial Matrix Factorization for Recommender Systems
Authors:
Olivier Gouvert,
Thomas Oberlin,
Cédric Févotte
Abstract:
We introduce negative binomial matrix factorization (NBMF), a matrix factorization technique specially designed for analyzing over-dispersed count data. It can be viewed as an extension of Poisson matrix factorization (PF) perturbed by a multiplicative term which models exposure. This term brings a degree of freedom for controlling the dispersion, making NBMF more robust to outliers. We show that…
▽ More
We introduce negative binomial matrix factorization (NBMF), a matrix factorization technique specially designed for analyzing over-dispersed count data. It can be viewed as an extension of Poisson matrix factorization (PF) perturbed by a multiplicative term which models exposure. This term brings a degree of freedom for controlling the dispersion, making NBMF more robust to outliers. We show that NBMF allows to skip traditional pre-processing stages, such as binarization, which lead to loss of information. Two estimation approaches are presented: maximum likelihood and variational Bayes inference. We test our model with a recommendation task and show its ability to predict user tastes with better precision than PF.
△ Less
Submitted 5 January, 2018;
originally announced January 2018.
-
Optimal spectral transportation with application to music transcription
Authors:
Rémi Flamary,
Cédric Févotte,
Nicolas Courty,
Valentin Emiya
Abstract:
Many spectral unmixing methods rely on the non-negative decomposition of spectral data onto a dictionary of spectral templates. In particular, state-of-the-art music transcription systems decompose the spectrogram of the input signal onto a dictionary of representative note spectra. The typical measures of fit used to quantify the adequacy of the decomposition compare the data and template entries…
▽ More
Many spectral unmixing methods rely on the non-negative decomposition of spectral data onto a dictionary of spectral templates. In particular, state-of-the-art music transcription systems decompose the spectrogram of the input signal onto a dictionary of representative note spectra. The typical measures of fit used to quantify the adequacy of the decomposition compare the data and template entries frequency-wise. As such, small displacements of energy from a frequency bin to another as well as variations of timber can disproportionally harm the fit. We address these issues by means of optimal transportation and propose a new measure of fit that treats the frequency distributions of energy holistically as opposed to frequency-wise. Building on the harmonic nature of sound, the new measure is invariant to shifts of energy to harmonically-related frequencies, as well as to small and local displacements of energy. Equipped with this new measure of fit, the dictionary of note templates can be considerably simplified to a set of Dirac vectors located at the target fundamental frequencies (musical pitch values). This in turns gives ground to a very fast and simple decomposition algorithm that achieves state-of-the-art performance on real musical data.
△ Less
Submitted 10 October, 2016; v1 submitted 30 September, 2016;
originally announced September 2016.
-
Nonlinear hyperspectral unmixing with robust nonnegative matrix factorization
Authors:
Cédric Févotte,
Nicolas Dobigeon
Abstract:
This paper introduces a robust mixing model to describe hyperspectral data resulting from the mixture of several pure spectral signatures. This new model not only generalizes the commonly used linear mixing model, but also allows for possible nonlinear effects to be easily handled, relying on mild assumptions regarding these nonlinearities. The standard nonnegativity and sum-to-one constraints inh…
▽ More
This paper introduces a robust mixing model to describe hyperspectral data resulting from the mixture of several pure spectral signatures. This new model not only generalizes the commonly used linear mixing model, but also allows for possible nonlinear effects to be easily handled, relying on mild assumptions regarding these nonlinearities. The standard nonnegativity and sum-to-one constraints inherent to spectral unmixing are coupled with a group-sparse constraint imposed on the nonlinearity component. This results in a new form of robust nonnegative matrix factorization. The data fidelity term is expressed as a beta-divergence, a continuous family of dissimilarity measures that takes the squared Euclidean distance and the generalized Kullback-Leibler divergence as special cases. The penalized objective is minimized with a block-coordinate descent that involves majorization-minimization updates. Simulation results obtained on synthetic and real data show that the proposed strategy competes with state-of-the-art linear and nonlinear unmixing methods.
△ Less
Submitted 6 March, 2014; v1 submitted 22 January, 2014;
originally announced January 2014.
-
Automatic Relevance Determination in Nonnegative Matrix Factorization with the β-Divergence
Authors:
Vincent Y. F. Tan,
Cédric Févotte
Abstract:
This paper addresses the estimation of the latent dimensionality in nonnegative matrix factorization (NMF) with the β-divergence. The β-divergence is a family of cost functions that includes the squared Euclidean distance, Kullback-Leibler and Itakura-Saito divergences as special cases. Learning the model order is important as it is necessary to strike the right balance between data fidelity and o…
▽ More
This paper addresses the estimation of the latent dimensionality in nonnegative matrix factorization (NMF) with the β-divergence. The β-divergence is a family of cost functions that includes the squared Euclidean distance, Kullback-Leibler and Itakura-Saito divergences as special cases. Learning the model order is important as it is necessary to strike the right balance between data fidelity and overfitting. We propose a Bayesian model based on automatic relevance determination in which the columns of the dictionary matrix and the rows of the activation matrix are tied together through a common scale parameter in their prior. A family of majorization-minimization algorithms is proposed for maximum a posteriori (MAP) estimation. A subset of scale parameters is driven to a small lower bound in the course of inference, with the effect of pruning the corresponding spurious components. We demonstrate the efficacy and robustness of our algorithms by performing extensive experiments on synthetic data, the swimmer dataset, a music decomposition example and a stock price prediction task.
△ Less
Submitted 5 October, 2012; v1 submitted 25 November, 2011;
originally announced November 2011.
-
Online algorithms for Nonnegative Matrix Factorization with the Itakura-Saito divergence
Authors:
Augustin Lefèvre,
Francis Bach,
Cédric Févotte
Abstract:
Nonnegative matrix factorization (NMF) is now a common tool for audio source separation. When learning NMF on large audio databases, one major drawback is that the complexity in time is O(FKN) when updating the dictionary (where (F;N) is the dimension of the input power spectrograms, and K the number of basis spectra), thus forbidding its application on signals longer than an hour. We provide an o…
▽ More
Nonnegative matrix factorization (NMF) is now a common tool for audio source separation. When learning NMF on large audio databases, one major drawback is that the complexity in time is O(FKN) when updating the dictionary (where (F;N) is the dimension of the input power spectrograms, and K the number of basis spectra), thus forbidding its application on signals longer than an hour. We provide an online algorithm with a complexity of O(FK) in time and memory for updates in the dictionary. We show on audio simulations that the online approach is faster for short audio signals and allows to analyze audio signals of several hours.
△ Less
Submitted 21 June, 2011;
originally announced June 2011.