-
Average performance analysis of the stochastic gradient method for online PCA
Authors:
Stephane Chretien,
Christophe Guyeux,
Zhen-Wai Olivier HO
Abstract:
This paper studies the complexity of the stochastic gradient algorithm for PCA when the data are observed in a streaming setting. We also propose an online approach for selecting the learning rate. Simulation experiments confirm the practical relevance of the plain stochastic gradient approach and that drastic improvements can be achieved by learning the learning rate.
This paper studies the complexity of the stochastic gradient algorithm for PCA when the data are observed in a streaming setting. We also propose an online approach for selecting the learning rate. Simulation experiments confirm the practical relevance of the plain stochastic gradient approach and that drastic improvements can be achieved by learning the learning rate.
△ Less
Submitted 3 April, 2018;
originally announced April 2018.
-
Using the LASSO for gene selection in bladder cancer data
Authors:
Stéphane Chrétien,
Christophe Guyeux,
Michael Boyer-Guittaut,
Régis Delage-Mouroux,
Françoise Descôtes
Abstract:
Given a gene expression data array of a list of bladder cancer patients with their tumor states, it may be difficult to determine which genes can operate as disease markers when the array is large and possibly contains outliers and missing data. An additional difficulty is that observations (tumor states) in the regression problem are discrete ones. In this article, we solve these problems on conc…
▽ More
Given a gene expression data array of a list of bladder cancer patients with their tumor states, it may be difficult to determine which genes can operate as disease markers when the array is large and possibly contains outliers and missing data. An additional difficulty is that observations (tumor states) in the regression problem are discrete ones. In this article, we solve these problems on concrete data using first a clustering approach, followed by Least Absolute Shrinkage and Selection Operator (LASSO) estimators in a nonlinear regression problem involving discrete variables, as described in the brand-new research work of Plan and Vershynin. Gene markers of the most severe tumor state are finally provided using the proposed approach.
△ Less
Submitted 20 April, 2015;
originally announced April 2015.
-
A Bregman Proximal ADMM for NMF with Outliers: Estimating features with missing values and outliers: a Bregman-proximal point algorithm for robust Non-negative Matrix Factorization with application to gene expression analysis
Authors:
Stéphane Chrétien,
Christophe Guyeux,
Bastien Conesa,
Régis Delage-Mouroux,
Michèle Jouvenot,
Philippe Huetz,
Françoise Descôtes
Abstract:
To extract the relevant features in a given dataset is a difficult task, recently resolved in the non-negative data case with the Non-negative Matrix factorization (NMF) method. The objective of this research work is to extend this method to the case of missing and/or corrupted data due to outliers. To do so, data are denoised, missing values are imputed, and outliers are detected while performing…
▽ More
To extract the relevant features in a given dataset is a difficult task, recently resolved in the non-negative data case with the Non-negative Matrix factorization (NMF) method. The objective of this research work is to extend this method to the case of missing and/or corrupted data due to outliers. To do so, data are denoised, missing values are imputed, and outliers are detected while performing a low-rank non-negative matrix factorization of the recovered matrix. To achieve this goal, a mixture of Bregman proximal methods and of the Augmented Lagrangian scheme are used, in a similar way to the so-called Alternating Direction of Multipliers method. An application to the analysis of gene expression data of patients with bladder cancer is finally proposed.
△ Less
Submitted 9 February, 2015;
originally announced February 2015.