-
Deep linear neural networks with arbitrary loss: All local minima are global
Authors:
Thomas Laurent,
James von Brecht
Abstract:
We consider deep linear networks with arbitrary convex differentiable loss. We provide a short and elementary proof of the fact that all local minima are global minima if the hidden layers are either 1) at least as wide as the input layer, or 2) at least as wide as the output layer. This result is the strongest possible in the following sense: If the loss is convex and Lipschitz but not differenti…
▽ More
We consider deep linear networks with arbitrary convex differentiable loss. We provide a short and elementary proof of the fact that all local minima are global minima if the hidden layers are either 1) at least as wide as the input layer, or 2) at least as wide as the output layer. This result is the strongest possible in the following sense: If the loss is convex and Lipschitz but not differentiable then deep linear networks can have sub-optimal local minima.
△ Less
Submitted 24 July, 2018; v1 submitted 4 December, 2017;
originally announced December 2017.
-
Enhanced Lasso Recovery on Graph
Authors:
Xavier Bresson,
Thomas Laurent,
James von Brecht
Abstract:
This work aims at recovering signals that are sparse on graphs. Compressed sensing offers techniques for signal recovery from a few linear measurements and graph Fourier analysis provides a signal representation on graph. In this paper, we leverage these two frameworks to introduce a new Lasso recovery algorithm on graphs. More precisely, we present a non-convex, non-smooth algorithm that outperfo…
▽ More
This work aims at recovering signals that are sparse on graphs. Compressed sensing offers techniques for signal recovery from a few linear measurements and graph Fourier analysis provides a signal representation on graph. In this paper, we leverage these two frameworks to introduce a new Lasso recovery algorithm on graphs. More precisely, we present a non-convex, non-smooth algorithm that outperforms the standard convex Lasso technique. We carry out numerical experiments on three benchmark graph datasets.
△ Less
Submitted 19 June, 2015;
originally announced June 2015.
-
Consistency of Cheeger and Ratio Graph Cuts
Authors:
Nicolas Garcia Trillos,
Dejan Slepcev,
James von Brecht,
Thomas Laurent,
Xavier Bresson
Abstract:
This paper establishes the consistency of a family of graph-cut-based algorithms for clustering of data clouds. We consider point clouds obtained as samples of a ground-truth measure. We investigate approaches to clustering based on minimizing objective functionals defined on proximity graphs of the given sample. Our focus is on functionals based on graph cuts like the Cheeger and ratio cuts. We s…
▽ More
This paper establishes the consistency of a family of graph-cut-based algorithms for clustering of data clouds. We consider point clouds obtained as samples of a ground-truth measure. We investigate approaches to clustering based on minimizing objective functionals defined on proximity graphs of the given sample. Our focus is on functionals based on graph cuts like the Cheeger and ratio cuts. We show that minimizers of the these cuts converge as the sample size increases to a minimizer of a corresponding continuum cut (which partitions the ground truth measure). Moreover, we obtain sharp conditions on how the connectivity radius can be scaled with respect to the number of sample points for the consistency to hold. We provide results for two-way and for multiway cuts. Furthermore we provide numerical experiments that illustrate the results and explore the optimality of scaling in dimension two.
△ Less
Submitted 24 November, 2014;
originally announced November 2014.
-
An Incremental Reseeding Strategy for Clustering
Authors:
Xavier Bresson,
Huiyi Hu,
Thomas Laurent,
Arthur Szlam,
James von Brecht
Abstract:
In this work we propose a simple and easily parallelizable algorithm for multiway graph partitioning. The algorithm alternates between three basic components: diffusing seed vertices over the graph, thresholding the diffused seeds, and then randomly reseeding the thresholded clusters. We demonstrate experimentally that the proper combination of these ingredients leads to an algorithm that achieves…
▽ More
In this work we propose a simple and easily parallelizable algorithm for multiway graph partitioning. The algorithm alternates between three basic components: diffusing seed vertices over the graph, thresholding the diffused seeds, and then randomly reseeding the thresholded clusters. We demonstrate experimentally that the proper combination of these ingredients leads to an algorithm that achieves state-of-the-art performance in terms of cluster purity on standard benchmarks datasets. Moreover, the algorithm runs an order of magnitude faster than the other algorithms that achieve comparable results in terms of accuracy. We also describe a coarsen, cluster and refine approach similar to GRACLUS and METIS that removes an additional order of magnitude from the runtime of our algorithm while still maintaining competitive accuracy.
△ Less
Submitted 15 June, 2014;
originally announced June 2014.
-
Multiclass Total Variation Clustering
Authors:
Xavier Bresson,
Thomas Laurent,
David Uminsky,
James H. von Brecht
Abstract:
Ideas from the image processing literature have recently motivated a new set of clustering algorithms that rely on the concept of total variation. While these algorithms perform well for bi-partitioning tasks, their recursive extensions yield unimpressive results for multiclass clustering tasks. This paper presents a general framework for multiclass total variation clustering that does not rely on…
▽ More
Ideas from the image processing literature have recently motivated a new set of clustering algorithms that rely on the concept of total variation. While these algorithms perform well for bi-partitioning tasks, their recursive extensions yield unimpressive results for multiclass clustering tasks. This paper presents a general framework for multiclass total variation clustering that does not rely on recursion. The results greatly outperform previous total variation algorithms and compare well with state-of-the-art NMF approaches.
△ Less
Submitted 5 June, 2013;
originally announced June 2013.