CUR Matrix Approximation through Convex Optimization for Feature Selection
Authors:
Kathryn Linehan,
Radu Balan
Abstract:
The singular value decomposition (SVD) is commonly used in applications requiring a low rank matrix approximation. However, the singular vectors cannot be interpreted in terms of the original data. For applications requiring this type of interpretation, e.g., selection of important data matrix columns or rows, the approximate CUR matrix factorization can be used. Work on the CUR matrix approximati…
▽ More
The singular value decomposition (SVD) is commonly used in applications requiring a low rank matrix approximation. However, the singular vectors cannot be interpreted in terms of the original data. For applications requiring this type of interpretation, e.g., selection of important data matrix columns or rows, the approximate CUR matrix factorization can be used. Work on the CUR matrix approximation has generally focused on algorithm development, theoretical guarantees, and applications. In this work, we present a novel deterministic CUR formulation and algorithm with theoretical convergence guarantees. The algorithm utilizes convex optimization, finds important columns and rows separately, and allows the user to control the number of important columns and rows selected from the original data matrix. We present numerical results and demonstrate the effectiveness of our CUR algorithm as a feature selection method on gene expression data. These results are compared to those using the SVD and other CUR algorithms as the feature selection method. Lastly, we present a novel application of CUR as a feature selection method to determine discriminant proteins when clustering protein expression data in a self-organizing map (SOM), and compare the performance of multiple CUR algorithms in this application.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
Approximation of the Proximal Operator of the $\ell_\infty$ Norm Using a Neural Network
Authors:
Kathryn Linehan,
Radu Balan
Abstract:
Computing the proximal operator of the $\ell_\infty$ norm, $\textbf{prox}_{α||\cdot||_\infty}(\mathbf{x})$, generally requires a sort of the input data, or at least a partial sort similar to quicksort. In order to avoid using a sort, we present an $O(m)$ approximation of $\textbf{prox}_{α||\cdot||_\infty}(\mathbf{x})$ using a neural network. A novel aspect of the network is that it is able to acce…
▽ More
Computing the proximal operator of the $\ell_\infty$ norm, $\textbf{prox}_{α||\cdot||_\infty}(\mathbf{x})$, generally requires a sort of the input data, or at least a partial sort similar to quicksort. In order to avoid using a sort, we present an $O(m)$ approximation of $\textbf{prox}_{α||\cdot||_\infty}(\mathbf{x})$ using a neural network. A novel aspect of the network is that it is able to accept vectors of varying lengths due to a feature selection process that uses moments of the input data. We present results on the accuracy of the approximation, feature importance, and computational efficiency of the approach. We show that the network outperforms a "vanilla neural network" that does not use feature selection. We also present an algorithm with corresponding theory to calculate $\textbf{prox}_{α||\cdot||_\infty}(\mathbf{x})$ exactly, relate it to the Moreau decomposition, and compare its computational efficiency to that of the approximation.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.