-
Humanity's Last Exam
Authors:
Long Phan,
Alice Gatti,
Ziwen Han,
Nathaniel Li,
Josephina Hu,
Hugh Zhang,
Chen Bo Calvin Zhang,
Mohamed Shaaban,
John Ling,
Sean Shi,
Michael Choi,
Anish Agrawal,
Arnav Chopra,
Adam Khoja,
Ryan Kim,
Richard Ren,
Jason Hausenloy,
Oliver Zhang,
Mantas Mazeika,
Dmitry Dodonov,
Tung Nguyen,
Jaeho Lee,
Daron Anderson,
Mikhail Doroshenko,
Alun Cennyth Stokes
, et al. (1084 additional authors not shown)
Abstract:
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of…
▽ More
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.
△ Less
Submitted 19 April, 2025; v1 submitted 24 January, 2025;
originally announced January 2025.
-
Conformally rigid graphs
Authors:
Stefan Steinerberger,
Rekha R. Thomas
Abstract:
Given a finite, simple, connected graph $G=(V,E)$ with $|V|=n$, we consider the associated graph Laplacian matrix $L = D - A$ with eigenvalues $0 = λ_1 < λ_2 \leq \dots \leq λ_n$. One can also consider the same graph equipped with positive edge weights $w:E \rightarrow \mathbb{R}_{> 0}$ normalized to $\sum_{e \in E} w_e = |E|$ and the associated weighted Laplacian matrix $L_w$. We say that $G$ is…
▽ More
Given a finite, simple, connected graph $G=(V,E)$ with $|V|=n$, we consider the associated graph Laplacian matrix $L = D - A$ with eigenvalues $0 = λ_1 < λ_2 \leq \dots \leq λ_n$. One can also consider the same graph equipped with positive edge weights $w:E \rightarrow \mathbb{R}_{> 0}$ normalized to $\sum_{e \in E} w_e = |E|$ and the associated weighted Laplacian matrix $L_w$. We say that $G$ is conformally rigid if constant edge-weights maximize the second eigenvalue $λ_2(w)$ of $L_w$ over all $w$, and minimize $λ_n(w')$ of $L_{w'}$ over all $w'$, i.e., for all $w,w'$, $$ λ_2(w) \leq λ_2(1) \leq λ_n(1) \leq λ_n(w').$$ Conformal rigidity requires an extraordinary amount of symmetry in $G$. Every edge-transitive graph is conformally rigid. We prove that every distance-regular graph, and hence every strongly-regular graph, is conformally rigid. Certain special graph embeddings can be used to characterize conformal rigidity. Cayley graphs can be conformally rigid but need not be, we prove a sufficient criterion. We also find a small set of conformally rigid graphs that do not belong into any of the above categories; these include the Hoffman graph, the crossing number graph 6B and others. Conformal rigidity can be certified via semidefinite programming, we provide explicit examples.
△ Less
Submitted 5 April, 2025; v1 submitted 18 February, 2024;
originally announced February 2024.
-
Spectrahedral Geometry of Graph Sparsifiers
Authors:
Catherine Babecki,
Stefan Steinerberger,
Rekha R. Thomas
Abstract:
We propose an approach to graph sparsification based on the idea of preserving the smallest $k$ eigenvalues and eigenvectors of the Graph Laplacian. This is motivated by the fact that small eigenvalues and their associated eigenvectors tend to be more informative of the global structure and geometry of the graph than larger eigenvalues and their eigenvectors. The set of all weighted subgraphs of a…
▽ More
We propose an approach to graph sparsification based on the idea of preserving the smallest $k$ eigenvalues and eigenvectors of the Graph Laplacian. This is motivated by the fact that small eigenvalues and their associated eigenvectors tend to be more informative of the global structure and geometry of the graph than larger eigenvalues and their eigenvectors. The set of all weighted subgraphs of a graph $G$ that have the same first $k$ eigenvalues (and eigenvectors) as $G$ is the intersection of a polyhedron with a cone of positive semidefinite matrices. We discuss the geometry of these sets and deduce the natural scale of $k$. Various families of graphs illustrate our construction.
△ Less
Submitted 9 June, 2023;
originally announced June 2023.
-
May the force be with you
Authors:
Yulan Zhang,
Anna C. Gilbert,
Stefan Steinerberger
Abstract:
Modern methods in dimensionality reduction are dominated by nonlinear attraction-repulsion force-based methods (this includes t-SNE, UMAP, ForceAtlas2, LargeVis, and many more). The purpose of this paper is to demonstrate that all such methods, by design, come with an additional feature that is being automatically computed along the way, namely the vector field associated with these forces. We sho…
▽ More
Modern methods in dimensionality reduction are dominated by nonlinear attraction-repulsion force-based methods (this includes t-SNE, UMAP, ForceAtlas2, LargeVis, and many more). The purpose of this paper is to demonstrate that all such methods, by design, come with an additional feature that is being automatically computed along the way, namely the vector field associated with these forces. We show how this vector field gives additional high-quality information and propose a general refinement strategy based on ideas from Morse theory. The efficiency of these ideas is illustrated specifically using t-SNE on synthetic and real-life data sets.
△ Less
Submitted 13 August, 2022;
originally announced August 2022.
-
Random Walks, Equidistribution and Graphical Designs
Authors:
Stefan Steinerberger,
Rekha R. Thomas
Abstract:
Let $G=(V,E)$ be a $d$-regular graph on $n$ vertices and let $μ_0$ be a probability measure on $V$. The act of moving to a randomly chosen neighbor leads to a sequence of probability measures supported on $V$ given by $μ_{k+1} = A D^{-1} μ_k$, where $A$ is the adjacency matrix and $D$ is the diagonal matrix of vertex degrees of $G$. Ordering the eigenvalues of $ A D^{-1}$ as…
▽ More
Let $G=(V,E)$ be a $d$-regular graph on $n$ vertices and let $μ_0$ be a probability measure on $V$. The act of moving to a randomly chosen neighbor leads to a sequence of probability measures supported on $V$ given by $μ_{k+1} = A D^{-1} μ_k$, where $A$ is the adjacency matrix and $D$ is the diagonal matrix of vertex degrees of $G$. Ordering the eigenvalues of $ A D^{-1}$ as $1 = λ_1 \geq |λ_2| \geq \dots \geq |λ_n| \geq 0$, it is well-known that the graphs for which $|λ_2|$ is small are those in which the random walk process converges quickly to the uniform distribution: for all initial probability measures $μ_0$ and all $k \geq 0$, $$ \sum_{v \in V} \left| μ_k(v) - \frac{1}{n} \right|^2 \leq λ_2^{2k}.$$ One could wonder whether this rate can be improved for specific initial probability measures $μ_0$. We show that if $G$ is regular, then for any $1 \leq \ell \leq n$, there exists a probability measure $μ_0$ supported on at most $\ell$ vertices so that $$ \sum_{v \in V} \left| μ_k(v) - \frac{1}{n} \right|^2 \leq λ_{\ell+1}^{2k}.$$ The result has applications in the graph sampling problem: we show that these measures have good sampling properties for reconstructing global averages.
△ Less
Submitted 10 June, 2022;
originally announced June 2022.
-
Sums of Distances on Graphs and Embeddings into Euclidean Space
Authors:
Stefan Steinerberger
Abstract:
Let $G=(V,E)$ be a finite, connected graph. We consider a greedy selection of vertices: given a list of vertices $x_1, \dots, x_k$, take $x_{k+1}$ to be any vertex maximizing the sum of distances to the existing vertices and iterate: we keep adding the `most remote' vertex. The frequency with which the vertices of the graph appear in this sequence converges to a set of probability measures with ni…
▽ More
Let $G=(V,E)$ be a finite, connected graph. We consider a greedy selection of vertices: given a list of vertices $x_1, \dots, x_k$, take $x_{k+1}$ to be any vertex maximizing the sum of distances to the existing vertices and iterate: we keep adding the `most remote' vertex. The frequency with which the vertices of the graph appear in this sequence converges to a set of probability measures with nice properties. The support of these measures is, generically, given by a rather small number of vertices $m \ll |V|$. We prove that this suggests that the graph $G$ is at most '$m$-dimensional' by exhibiting an explicit $1-$Lipschitz embedding $φ: G \rightarrow \ell^1(\mathbb{R}^m)$ with good properties.
△ Less
Submitted 5 May, 2022; v1 submitted 28 April, 2022;
originally announced April 2022.
-
A common variable minimax theorem for graphs
Authors:
Ronald R. Coifman,
Nicholas F. Marshall,
Stefan Steinerberger
Abstract:
Let $\mathcal{G} = \{G_1 = (V, E_1), \dots, G_m = (V, E_m)\}$ be a collection of $m$ graphs defined on a common set of vertices $V$ but with different edge sets $E_1, \dots, E_m$. Informally, a function $f :V \rightarrow \mathbb{R}$ is smooth with respect to $G_k = (V,E_k)$ if $f(u) \sim f(v)$ whenever $(u, v) \in E_k$. We study the problem of understanding whether there exists a nonconstant funct…
▽ More
Let $\mathcal{G} = \{G_1 = (V, E_1), \dots, G_m = (V, E_m)\}$ be a collection of $m$ graphs defined on a common set of vertices $V$ but with different edge sets $E_1, \dots, E_m$. Informally, a function $f :V \rightarrow \mathbb{R}$ is smooth with respect to $G_k = (V,E_k)$ if $f(u) \sim f(v)$ whenever $(u, v) \in E_k$. We study the problem of understanding whether there exists a nonconstant function that is smooth with respect to all graphs in $\mathcal{G}$, simultaneously, and how to find it if it exists.
△ Less
Submitted 30 July, 2021;
originally announced July 2021.
-
A 0.502$\cdot$MaxCut Approximation using Quadratic Programming
Authors:
Stefan Steinerberger
Abstract:
We study the MaxCut problem for graphs $G=(V,E)$. The problem is NP-hard, there are two main approximation algorithms with theoretical guarantees: (1) the Goemans \& Williamson algorithm uses semi-definite programming to provide a 0.878MaxCut approximation (which, if the Unique Games Conjecture is true, is the best that can be done in polynomial time) and (2) Trevisan proposed an algorithm using s…
▽ More
We study the MaxCut problem for graphs $G=(V,E)$. The problem is NP-hard, there are two main approximation algorithms with theoretical guarantees: (1) the Goemans \& Williamson algorithm uses semi-definite programming to provide a 0.878MaxCut approximation (which, if the Unique Games Conjecture is true, is the best that can be done in polynomial time) and (2) Trevisan proposed an algorithm using spectral graph theory from which a 0.614MaxCut approximation can be obtained. We discuss a new approach using a specific quadratic program and prove that its solution can be used to obtain at least a 0.502MaxCut approximation. The algorithm seems to perform well in practice.
△ Less
Submitted 29 April, 2021;
originally announced April 2021.
-
t-SNE, Forceful Colorings and Mean Field Limits
Authors:
Yulan Zhang,
Stefan Steinerberger
Abstract:
t-SNE is one of the most commonly used force-based nonlinear dimensionality reduction methods. This paper has two contributions: the first is forceful colorings, an idea that is also applicable to other force-based methods (UMAP, ForceAtlas2,...). In every equilibrium, the attractive and repulsive forces acting on a particle cancel out: however, both the size and the direction of the attractive (o…
▽ More
t-SNE is one of the most commonly used force-based nonlinear dimensionality reduction methods. This paper has two contributions: the first is forceful colorings, an idea that is also applicable to other force-based methods (UMAP, ForceAtlas2,...). In every equilibrium, the attractive and repulsive forces acting on a particle cancel out: however, both the size and the direction of the attractive (or repulsive) forces acting on a particle are related to its properties: the force vector can serve as an additional feature. Secondly, we analyze the case of t-SNE acting on a single homogeneous cluster (modeled by affinities coming from the adjacency matrix of a random k-regular graph); we derive a mean-field model that leads to interesting questions in classical calculus of variations. The model predicts that, in the limit, the t-SNE embedding of a single perfectly homogeneous cluster is not a point but a thin annulus of diameter $\sim k^{-1/4} n^{-1/4}$. This is supported by numerical results. The mean field ansatz extends to other force-based dimensionality reduction methods.
△ Less
Submitted 25 February, 2021;
originally announced February 2021.
-
Max-Cut via Kuramoto-type Oscillators
Authors:
Stefan Steinerberger
Abstract:
We consider the Max-Cut problem. Let $G = (V,E)$ be a graph with adjacency matrix $(a_{ij})_{i,j=1}^{n}$. Burer, Monteiro & Zhang proposed to find, for $n$ angles $\left\{θ_1, θ_2, \dots, θ_n\right\} \subset [0, 2π]$, minima of the energy $$ f(θ_1, \dots, θ_n) = \sum_{i,j=1}^{n} a_{ij} \cos{(θ_i - θ_j)}$$ because configurations achieving a global minimum leads to a partition of size 0.878 Max-Cut(…
▽ More
We consider the Max-Cut problem. Let $G = (V,E)$ be a graph with adjacency matrix $(a_{ij})_{i,j=1}^{n}$. Burer, Monteiro & Zhang proposed to find, for $n$ angles $\left\{θ_1, θ_2, \dots, θ_n\right\} \subset [0, 2π]$, minima of the energy $$ f(θ_1, \dots, θ_n) = \sum_{i,j=1}^{n} a_{ij} \cos{(θ_i - θ_j)}$$ because configurations achieving a global minimum leads to a partition of size 0.878 Max-Cut(G). This approach is known to be computationally viable and leads to very good results in practice. We prove that by replacing $\cos{(θ_i - θ_j)}$ with an explicit function $g_{\varepsilon}(θ_i - θ_j)$ global minima of this new functional lead to a $(1-\varepsilon)$Max-Cut(G). This suggests some interesting algorithms that perform well. It also shows that the problem of finding approximate global minima of energy functionals of this type is NP-hard in general.
△ Less
Submitted 9 February, 2021;
originally announced February 2021.
-
Neural Collapse with Cross-Entropy Loss
Authors:
Jianfeng Lu,
Stefan Steinerberger
Abstract:
We consider the variational problem of cross-entropy loss with $n$ feature vectors on a unit hypersphere in $\mathbb{R}^d$. We prove that when $d \geq n - 1$, the global minimum is given by the simplex equiangular tight frame, which justifies the neural collapse behavior. We also prove that as $n \rightarrow \infty$ with fixed $d$, the minimizing points will distribute uniformly on the hypersphere…
▽ More
We consider the variational problem of cross-entropy loss with $n$ feature vectors on a unit hypersphere in $\mathbb{R}^d$. We prove that when $d \geq n - 1$, the global minimum is given by the simplex equiangular tight frame, which justifies the neural collapse behavior. We also prove that as $n \rightarrow \infty$ with fixed $d$, the minimizing points will distribute uniformly on the hypersphere and show a connection with the frame potential of Benedetto & Fickus.
△ Less
Submitted 18 January, 2021; v1 submitted 15 December, 2020;
originally announced December 2020.
-
On the Regularization Effect of Stochastic Gradient Descent applied to Least Squares
Authors:
Stefan Steinerberger
Abstract:
We study the behavior of stochastic gradient descent applied to $\|Ax -b \|_2^2 \rightarrow \min$ for invertible $A \in \mathbb{R}^{n \times n}$. We show that there is an explicit constant $c_{A}$ depending (mildly) on $A$ such that…
▽ More
We study the behavior of stochastic gradient descent applied to $\|Ax -b \|_2^2 \rightarrow \min$ for invertible $A \in \mathbb{R}^{n \times n}$. We show that there is an explicit constant $c_{A}$ depending (mildly) on $A$ such that $$ \mathbb{E} ~\left\| Ax_{k+1}-b\right\|^2_{2} \leq \left(1 + \frac{c_{A}}{\|A\|_F^2}\right) \left\|A x_k -b \right\|^2_{2} - \frac{2}{\|A\|_F^2} \left\|A^T A (x_k - x)\right\|^2_{2}.$$ This is a curious inequality: the last term has one more matrix applied to the residual $u_k - u$ than the remaining terms: if $x_k - x$ is mainly comprised of large singular vectors, stochastic gradient descent leads to a quick regularization. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This explains a (known) regularization phenomenon: an energy cascade from large singular values to small singular values smoothes.
△ Less
Submitted 1 September, 2020; v1 submitted 26 July, 2020;
originally announced July 2020.
-
A Spectral Approach to the Shortest Path Problem
Authors:
Stefan Steinerberger
Abstract:
Let $G=(V,E)$ be a simple, connected graph. One is often interested in a short path between two vertices $u,v$. We propose a spectral algorithm: construct the function $φ:V \rightarrow \mathbb{R}_{\geq 0}$ $$ φ= \arg\min_{f:V \rightarrow \mathbb{R} \atop f(u) = 0, f \not\equiv 0} \frac{\sum_{(w_1, w_2) \in E}{(f(w_1)-f(w_2))^2}}{\sum_{w \in V}{f(w)^2}}.$$ $φ$ can also be understood as the smallest…
▽ More
Let $G=(V,E)$ be a simple, connected graph. One is often interested in a short path between two vertices $u,v$. We propose a spectral algorithm: construct the function $φ:V \rightarrow \mathbb{R}_{\geq 0}$ $$ φ= \arg\min_{f:V \rightarrow \mathbb{R} \atop f(u) = 0, f \not\equiv 0} \frac{\sum_{(w_1, w_2) \in E}{(f(w_1)-f(w_2))^2}}{\sum_{w \in V}{f(w)^2}}.$$ $φ$ can also be understood as the smallest eigenvector of the Laplacian Matrix $L=D-A$ after the $u-$th row and column have been removed. We start in the point $v$ and construct a path from $v$ to $u$: at each step, we move to the neighbor for which $φ$ is the smallest. This algorithm provably terminates and results in a short path from $v$ to $u$, often the shortest. The efficiency of this method is due to a discrete analogue of a phenomenon in Partial Differential Equations that is not well understood. We prove optimality for trees and discuss a number of open questions.
△ Less
Submitted 16 April, 2020; v1 submitted 2 April, 2020;
originally announced April 2020.
-
Spectral Clustering Revisited: Information Hidden in the Fiedler Vector
Authors:
Adela DePavia,
Stefan Steinerberger
Abstract:
We are interested in the clustering problem on graphs: it is known that if there are two underlying clusters, then the signs of the eigenvector corresponding to the second largest eigenvalue of the adjacency matrix can reliably reconstruct the two clusters. We argue that the vertices for which the eigenvector has the largest and the smallest entries, respectively, are unusually strongly connected…
▽ More
We are interested in the clustering problem on graphs: it is known that if there are two underlying clusters, then the signs of the eigenvector corresponding to the second largest eigenvalue of the adjacency matrix can reliably reconstruct the two clusters. We argue that the vertices for which the eigenvector has the largest and the smallest entries, respectively, are unusually strongly connected to their own cluster and more reliably classified than the rest. This can be regarded as a discrete version of the Hot Spots conjecture and should be useful in applications. We give a rigorous proof for the stochastic block model and several examples.
△ Less
Submitted 22 March, 2020;
originally announced March 2020.
-
Randomly Aggregated Least Squares for Support Recovery
Authors:
Ofir Lindenbaum,
Stefan Steinerberger
Abstract:
We study the problem of exact support recovery: given an (unknown) vector $θ\in \left\{-1,0,1\right\}^D$, we are given access to the noisy measurement $$ y = Xθ+ ω,$$ where $X \in \mathbb{R}^{N \times D}$ is a (known) Gaussian matrix and the noise $ω\in \mathbb{R}^N$ is an (unknown) Gaussian vector. How small we can choose $N$ and still reliably recover the support of $θ$? We present RAWLS (Random…
▽ More
We study the problem of exact support recovery: given an (unknown) vector $θ\in \left\{-1,0,1\right\}^D$, we are given access to the noisy measurement $$ y = Xθ+ ω,$$ where $X \in \mathbb{R}^{N \times D}$ is a (known) Gaussian matrix and the noise $ω\in \mathbb{R}^N$ is an (unknown) Gaussian vector. How small we can choose $N$ and still reliably recover the support of $θ$? We present RAWLS (Randomly Aggregated UnWeighted Least Squares Support Recovery): the main idea is to take random subsets of the $N$ equations, perform a least squares recovery over this reduced bit of information and then average over many random subsets. We show that the proposed procedure can provably recover an approximation of $θ$ and demonstrate its use in support recovery through numerical examples.
△ Less
Submitted 9 November, 2020; v1 submitted 16 March, 2020;
originally announced March 2020.
-
The Spectral Underpinning of word2vec
Authors:
Ariel Jaffe,
Yuval Kluger,
Ofir Lindenbaum,
Jonathan Patsenker,
Erez Peterfreund,
Stefan Steinerberger
Abstract:
word2vec due to Mikolov \textit{et al.} (2013) is a word embedding method that is widely used in natural language processing. Despite its great success and frequent use, theoretical justification is still lacking. The main contribution of our paper is to propose a rigorous analysis of the highly nonlinear functional of word2vec. Our results suggest that word2vec may be primarily driven by an under…
▽ More
word2vec due to Mikolov \textit{et al.} (2013) is a word embedding method that is widely used in natural language processing. Despite its great success and frequent use, theoretical justification is still lacking. The main contribution of our paper is to propose a rigorous analysis of the highly nonlinear functional of word2vec. Our results suggest that word2vec may be primarily driven by an underlying spectral method. This insight may open the door to obtaining provable guarantees for word2vec. We support these findings by numerical simulations. One fascinating open question is whether the nonlinear properties of word2vec that are not captured by the spectral method are beneficial and, if so, by what mechanism.
△ Less
Submitted 9 November, 2020; v1 submitted 27 February, 2020;
originally announced February 2020.
-
Non-Convex Planar Harmonic Maps
Authors:
Shahar Z. Kovalsky,
Noam Aigerman,
Ingrid Daubechies,
Michael Kazhdan,
Jianfeng Lu,
Stefan Steinerberger
Abstract:
We formulate a novel characterization of a family of invertible maps between two-dimensional domains. Our work follows two classic results: The Radó-Kneser-Choquet (RKC) theorem, which establishes the invertibility of harmonic maps into a convex planer domain; and Tutte's embedding theorem for planar graphs - RKC's discrete counterpart - which proves the invertibility of piecewise linear maps of t…
▽ More
We formulate a novel characterization of a family of invertible maps between two-dimensional domains. Our work follows two classic results: The Radó-Kneser-Choquet (RKC) theorem, which establishes the invertibility of harmonic maps into a convex planer domain; and Tutte's embedding theorem for planar graphs - RKC's discrete counterpart - which proves the invertibility of piecewise linear maps of triangulated domains satisfying a discrete-harmonic principle, into a convex planar polygon. In both theorems, the convexity of the target domain is essential for ensuring invertibility. We extend these characterizations, in both the continuous and discrete cases, by replacing convexity with a less restrictive condition. In the continuous case, Alessandrini and Nesi provide a characterization of invertible harmonic maps into non-convex domains with a smooth boundary by adding additional conditions on orientation preservation along the boundary. We extend their results by defining a condition on the normal derivatives along the boundary, which we call the cone condition; this condition is tractable and geometrically intuitive, encoding a weak notion of local invertibility. The cone condition enables us to extend Alessandrini and Nesi to the case of harmonic maps into non-convex domains with a piecewise-smooth boundary. In the discrete case, we use an analog of the cone condition to characterize invertible discrete-harmonic piecewise-linear maps of triangulations. This gives an analog of our continuous results and characterizes invertible discrete-harmonic maps in terms of the orientation of triangles incident on the boundary.
△ Less
Submitted 5 January, 2020;
originally announced January 2020.
-
Extreme Values of the Fiedler Vector on Trees
Authors:
Roy R. Lederman,
S. Steinerberger
Abstract:
Let $G$ be a connected tree on $n$ vertices and let $L = D-A$ denote the Laplacian matrix on $G$. The second-smallest eigenvalue $λ_{2}(G) > 0$, also known as the algebraic connectivity, as well as the associated eigenvector $φ_2$ have been of substantial interest. We investigate the question of when the maxima and minima of $φ_2$ are assumed at the endpoints of the longest path in $G$. Our result…
▽ More
Let $G$ be a connected tree on $n$ vertices and let $L = D-A$ denote the Laplacian matrix on $G$. The second-smallest eigenvalue $λ_{2}(G) > 0$, also known as the algebraic connectivity, as well as the associated eigenvector $φ_2$ have been of substantial interest. We investigate the question of when the maxima and minima of $φ_2$ are assumed at the endpoints of the longest path in $G$. Our results also apply to more general graphs that `behave globally' like a tree but can exhibit more complicated local structure. The crucial new ingredient is a reproducing formula for the eigenvector $φ_k$.
△ Less
Submitted 10 March, 2023; v1 submitted 17 December, 2019;
originally announced December 2019.
-
Heavy-tailed kernels reveal a finer cluster structure in t-SNE visualisations
Authors:
Dmitry Kobak,
George Linderman,
Stefan Steinerberger,
Yuval Kluger,
Philipp Berens
Abstract:
T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the "crowding problem" of SNE. Here, we develop an efficient implementation of t-SNE for a $t$-distribution kernel with an arbitrary degree of fre…
▽ More
T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the "crowding problem" of SNE. Here, we develop an efficient implementation of t-SNE for a $t$-distribution kernel with an arbitrary degree of freedom $ν$, with $ν\to\infty$ corresponding to SNE and $ν=1$ corresponding to the standard t-SNE. Using theoretical analysis and toy examples, we show that $ν<1$ can further reduce the crowding problem and reveal finer cluster structure that is invisible in standard t-SNE. We further demonstrate the striking effect of heavier-tailed kernels on large real-life data sets such as MNIST, single-cell RNA-sequencing data, and the HathiTrust library. We use domain knowledge to confirm that the revealed clusters are meaningful. Overall, we argue that modifying the tail heaviness of the t-SNE kernel can yield additional insight into the cluster structure of the data.
△ Less
Submitted 4 April, 2019; v1 submitted 15 February, 2019;
originally announced February 2019.
-
Recovering Trees with Convex Clustering
Authors:
Eric C. Chi,
Stefan Steinerberger
Abstract:
Convex clustering refers, for given $\left\{x_1, \dots, x_n\right\} \subset \mathbb{R}^p$, to the minimization of \begin{eqnarray*} u(γ) & = & \underset{u_1, \dots, u_n }{\arg\min}\;\sum_{i=1}^{n}{\lVert x_i - u_i \rVert^2} + γ\sum_{i,j=1}^{n}{w_{ij} \lVert u_i - u_j\rVert},\\ \end{eqnarray*} where $w_{ij} \geq 0$ is an affinity that quantifies the similarity between $x_i$ and $x_j$. We prove that…
▽ More
Convex clustering refers, for given $\left\{x_1, \dots, x_n\right\} \subset \mathbb{R}^p$, to the minimization of \begin{eqnarray*} u(γ) & = & \underset{u_1, \dots, u_n }{\arg\min}\;\sum_{i=1}^{n}{\lVert x_i - u_i \rVert^2} + γ\sum_{i,j=1}^{n}{w_{ij} \lVert u_i - u_j\rVert},\\ \end{eqnarray*} where $w_{ij} \geq 0$ is an affinity that quantifies the similarity between $x_i$ and $x_j$. We prove that if the affinities $w_{ij}$ reflect a tree structure in the $\left\{x_1, \dots, x_n\right\}$, then the convex clustering solution path reconstructs the tree exactly. The main technical ingredient implies the following combinatorial byproduct: for every set $\left\{x_1, \dots, x_n \right\} \subset \mathbb{R}^p$ of $n \geq 2$ distinct points, there exist at least $n/6$ points with the property that for any of these points $x$ there is a unit vector $v \in \mathbb{R}^p$ such that, when viewed from $x$, `most' points lie in the direction $v$ \begin{eqnarray*} \frac{1}{n-1}\sum_{i=1 \atop x_i \neq x}^{n}{ \left\langle \frac{x_i - x}{\lVert x_i - x \rVert}, v \right\rangle} & \geq & \frac{1}{4}. \end{eqnarray*}
△ Less
Submitted 28 June, 2018; v1 submitted 28 June, 2018;
originally announced June 2018.
-
On the Dual Geometry of Laplacian Eigenfunctions
Authors:
Alexander Cloninger,
Stefan Steinerberger
Abstract:
We discuss the geometry of Laplacian eigenfunctions $-Δφ= λφ$ on compact manifolds $(M,g)$ and combinatorial graphs $G=(V,E)$. The 'dual' geometry of Laplacian eigenfunctions is well understood on $\mathbb{T}^d$ (identified with $\mathbb{Z}^d$) and $\mathbb{R}^n$ (which is self-dual). The dual geometry is of tremendous role in various fields of pure and applied mathematics. The purpose of our pape…
▽ More
We discuss the geometry of Laplacian eigenfunctions $-Δφ= λφ$ on compact manifolds $(M,g)$ and combinatorial graphs $G=(V,E)$. The 'dual' geometry of Laplacian eigenfunctions is well understood on $\mathbb{T}^d$ (identified with $\mathbb{Z}^d$) and $\mathbb{R}^n$ (which is self-dual). The dual geometry is of tremendous role in various fields of pure and applied mathematics. The purpose of our paper is to point out a notion of similarity between eigenfunctions that allows to reconstruct that geometry. Our measure of 'similarity' $ α(φ_λ, φ_μ)$ between eigenfunctions $φ_λ$ and $φ_μ$ is given by a global average of local correlations $$ α(φ_λ, φ_μ)^2 = \| φ_λ φ_μ \|_{L^2}^{-2}\int_{M}{ \left( \int_{M}{ p(t,x,y)( φ_λ(y) - φ_λ(x))( φ_μ(y) - φ_μ(x)) dy} \right)^2 dx},$$ where $p(t,x,y)$ is the classical heat kernel and $e^{-t λ} + e^{-t μ} = 1$. This notion recovers all classical notions of duality but is equally applicable to other (rough) geometries and graphs; many numerical examples in different continuous and discrete settings illustrate the result.
△ Less
Submitted 25 April, 2018;
originally announced April 2018.
-
Numerical Integration on Graphs: where to sample and how to weigh
Authors:
George C. Linderman,
Stefan Steinerberger
Abstract:
Let $G=(V,E,w)$ be a finite, connected graph with weighted edges. We are interested in the problem of finding a subset $W \subset V$ of vertices and weights $a_w$ such that $$ \frac{1}{|V|}\sum_{v \in V}^{}{f(v)} \sim \sum_{w \in W}{a_w f(w)}$$ for functions $f:V \rightarrow \mathbb{R}$ that are `smooth' with respect to the geometry of the graph. The main application are problems where $f$ is know…
▽ More
Let $G=(V,E,w)$ be a finite, connected graph with weighted edges. We are interested in the problem of finding a subset $W \subset V$ of vertices and weights $a_w$ such that $$ \frac{1}{|V|}\sum_{v \in V}^{}{f(v)} \sim \sum_{w \in W}{a_w f(w)}$$ for functions $f:V \rightarrow \mathbb{R}$ that are `smooth' with respect to the geometry of the graph. The main application are problems where $f$ is known to somehow depend on the underlying graph but is expensive to evaluate on even a single vertex. We prove an inequality showing that the integration problem can be rewritten as a geometric problem (`the optimal packing of heat balls'). We discuss how one would construct approximate solutions of the heat ball packing problem; numerical examples demonstrate the efficiency of the method.
△ Less
Submitted 19 March, 2018;
originally announced March 2018.
-
Efficient Algorithms for t-distributed Stochastic Neighborhood Embedding
Authors:
George C. Linderman,
Manas Rachh,
Jeremy G. Hoskins,
Stefan Steinerberger,
Yuval Kluger
Abstract:
t-distributed Stochastic Neighborhood Embedding (t-SNE) is a method for dimensionality reduction and visualization that has become widely popular in recent years. Efficient implementations of t-SNE are available, but they scale poorly to datasets with hundreds of thousands to millions of high dimensional data-points. We present Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)…
▽ More
t-distributed Stochastic Neighborhood Embedding (t-SNE) is a method for dimensionality reduction and visualization that has become widely popular in recent years. Efficient implementations of t-SNE are available, but they scale poorly to datasets with hundreds of thousands to millions of high dimensional data-points. We present Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE), which dramatically accelerates the computation of t-SNE. The most time-consuming step of t-SNE is a convolution that we accelerate by interpolating onto an equispaced grid and subsequently using the fast Fourier transform to perform the convolution. We also optimize the computation of input similarities in high dimensions using multi-threaded approximate nearest neighbors. We further present a modification to t-SNE called "late exaggeration," which allows for easier identification of clusters in t-SNE embeddings. Finally, for datasets that cannot be loaded into the memory, we present out-of-core randomized principal component analysis (oocPCA), so that the top principal components of a dataset can be computed without ever fully loading the matrix, hence allowing for t-SNE of large datasets to be computed on resource-limited machines.
△ Less
Submitted 24 December, 2017;
originally announced December 2017.
-
Randomized Near Neighbor Graphs, Giant Components, and Applications in Data Science
Authors:
George C. Linderman,
Gal Mishne,
Yuval Kluger,
Stefan Steinerberger
Abstract:
If we pick $n$ random points uniformly in $[0,1]^d$ and connect each point to its $k-$nearest neighbors, then it is well known that there exists a giant connected component with high probability. We prove that in $[0,1]^d$ it suffices to connect every point to $ c_{d,1} \log{\log{n}}$ points chosen randomly among its $ c_{d,2} \log{n}-$nearest neighbors to ensure a giant component of size…
▽ More
If we pick $n$ random points uniformly in $[0,1]^d$ and connect each point to its $k-$nearest neighbors, then it is well known that there exists a giant connected component with high probability. We prove that in $[0,1]^d$ it suffices to connect every point to $ c_{d,1} \log{\log{n}}$ points chosen randomly among its $ c_{d,2} \log{n}-$nearest neighbors to ensure a giant component of size $n - o(n)$ with high probability. This construction yields a much sparser random graph with $\sim n \log\log{n}$ instead of $\sim n \log{n}$ edges that has comparable connectivity properties. This result has nontrivial implications for problems in data science where an affinity matrix is constructed: instead of picking the $k-$nearest neighbors, one can often pick $k' \ll k$ random points out of the $k-$nearest neighbors without sacrificing efficiency. This can massively simplify and accelerate computation, we illustrate this with several numerical examples.
△ Less
Submitted 13 November, 2017;
originally announced November 2017.
-
Stability, Fairness and Random Walks in the Bargaining Problem
Authors:
Jakob Kapeller,
Stefan Steinerberger
Abstract:
We study the classical bargaining problem and its two canonical solutions, (Nash and Kalai-Smorodinsky), from a novel point of view: we ask for stability of the solution if both players are able distort the underlying bargaining process by reference to a third party (e.g. a court). By exploring the simplest case, where decisions of the third party are made randomly we obtain a stable solution, whe…
▽ More
We study the classical bargaining problem and its two canonical solutions, (Nash and Kalai-Smorodinsky), from a novel point of view: we ask for stability of the solution if both players are able distort the underlying bargaining process by reference to a third party (e.g. a court). By exploring the simplest case, where decisions of the third party are made randomly we obtain a stable solution, where players do not have any incentive to refer to such a third party. While neither the Nash nor the Kalai-Smorodinsky solution are able to ensure stability in case reference to a third party is possible, we found that the Kalai-Smorodinsky solution seems to always dominate the stable allocation which constitutes novel support in favor of the latter.
△ Less
Submitted 8 July, 2017;
originally announced July 2017.
-
Clustering with t-SNE, provably
Authors:
George C. Linderman,
Stefan Steinerberger
Abstract:
t-distributed Stochastic Neighborhood Embedding (t-SNE), a clustering and visualization method proposed by van der Maaten & Hinton in 2008, has rapidly become a standard tool in a number of natural sciences. Despite its overwhelming success, there is a distinct lack of mathematical foundations and the inner workings of the algorithm are not well understood. The purpose of this paper is to prove th…
▽ More
t-distributed Stochastic Neighborhood Embedding (t-SNE), a clustering and visualization method proposed by van der Maaten & Hinton in 2008, has rapidly become a standard tool in a number of natural sciences. Despite its overwhelming success, there is a distinct lack of mathematical foundations and the inner workings of the algorithm are not well understood. The purpose of this paper is to prove that t-SNE is able to recover well-separated clusters; more precisely, we prove that t-SNE in the `early exaggeration' phase, an optimization technique proposed by van der Maaten & Hinton (2008) and van der Maaten (2014), can be rigorously analyzed. As a byproduct, the proof suggests novel ways for setting the exaggeration parameter $α$ and step size $h$. Numerical examples illustrate the effectiveness of these rules: in particular, the quality of embedding of topological structures (e.g. the swiss roll) improves. We also discuss a connection to spectral clustering methods.
△ Less
Submitted 8 June, 2017;
originally announced June 2017.
-
Ulam Sequences and Ulam Sets
Authors:
Noah Kravitz,
Stefan Steinerberger
Abstract:
The Ulam sequence is given by $a_1 =1, a_2 = 2$, and then, for $n \geq 3$, the element $a_n$ is defined as the smallest integer that can be written as the sum of two distinct earlier elements in a unique way. This gives the sequence $1, 2, 3, 4, 6, 8, 11, 13, 16, \dots$, which has a mysterious quasi-periodic behavior that is not understood. Ulam's definition naturally extends to higher dimensions:…
▽ More
The Ulam sequence is given by $a_1 =1, a_2 = 2$, and then, for $n \geq 3$, the element $a_n$ is defined as the smallest integer that can be written as the sum of two distinct earlier elements in a unique way. This gives the sequence $1, 2, 3, 4, 6, 8, 11, 13, 16, \dots$, which has a mysterious quasi-periodic behavior that is not understood. Ulam's definition naturally extends to higher dimensions: for a set of initial vectors $\left\{v_1, \dots, v_k\right\} \subset \mathbb{R}^n$, we define a sequence by repeatedly adding the smallest elements that can be uniquely written as the sum of two distinct vectors already in the set. The resulting sets have very rich structure that turns out to be universal for many commuting binary operations. We give examples of different types of behavior, prove several universality results, and describe new unexplained phenomena.
△ Less
Submitted 27 August, 2018; v1 submitted 4 May, 2017;
originally announced May 2017.
-
A Hidden Signal in the Ulam sequence
Authors:
Stefan Steinerberger
Abstract:
The Ulam sequence is defined as $a_1 =1, a_2 = 2$ and $a_n$ being the smallest integer that can be written as the sum of two distinct earlier elements in a unique way. This gives $$1, 2, 3, 4, 6, 8, 11, 13, 16, 18, 26, 28, 36, 38, 47, \dots$$ Ulam remarked that understanding the sequence, which has been described as 'quite erratic', seems difficult and indeed nothing is known. We report the empiri…
▽ More
The Ulam sequence is defined as $a_1 =1, a_2 = 2$ and $a_n$ being the smallest integer that can be written as the sum of two distinct earlier elements in a unique way. This gives $$1, 2, 3, 4, 6, 8, 11, 13, 16, 18, 26, 28, 36, 38, 47, \dots$$ Ulam remarked that understanding the sequence, which has been described as 'quite erratic', seems difficult and indeed nothing is known. We report the empirical discovery of a surprising global rigidity phenomenon: there seems to exist a real $α\sim 2.5714474995\dots$ such that $$\left\{αa_n: n\in \mathbb{N}\right\} \quad \mbox{mod}~2π\quad \mbox{generates an absolutely continuous \textit{non-uniform} measure}$$ supported on a subset of $\mathbb{T}$. Indeed, for the first $10^7$ elements of Ulam's sequence, $$ \cos{\left( 2.5714474995~ a_n\right)} < 0 \qquad \mbox{for all}~a_n \notin \left\{2, 3, 47, 69\right\}.$$ The same phenomenon arises for some other initial conditions $a_1, a_2$: the distribution functions look very different from each other and have curious shapes. A similar but more subtle phenomenon seems to arise in Lagarias' variant of MacMahon's 'primes of measurement' sequence.
△ Less
Submitted 5 July, 2016; v1 submitted 1 July, 2015;
originally announced July 2015.
-
A filtering technique for Markov chains with applications to spectral embedding
Authors:
Stefan Steinerberger
Abstract:
Spectral methods have proven to be a highly effective tool in understanding the intrinsic geometry of a high-dimensional data set $\left\{x_i \right\}_{i=1}^{n} \subset \mathbb{R}^d$. The key ingredient is the construction of a Markov chain on the set, where transition probabilities depend on the distance between elements, for example where for every $1 \leq j \leq n$ the probability of going from…
▽ More
Spectral methods have proven to be a highly effective tool in understanding the intrinsic geometry of a high-dimensional data set $\left\{x_i \right\}_{i=1}^{n} \subset \mathbb{R}^d$. The key ingredient is the construction of a Markov chain on the set, where transition probabilities depend on the distance between elements, for example where for every $1 \leq j \leq n$ the probability of going from $x_j$ to $x_i$ is proportional to $$ p_{ij} \sim \exp \left( -\frac{1}{\varepsilon}\|x_i -x_j\|^2_{\ell^2(\mathbb{R}^d)}\right) \qquad \mbox{where}~\varepsilon>0~\mbox{is a free parameter}.$$ We propose a method which increases the self-consistency of such Markov chains before spectral methods are applied. Instead of directly using a Markov transition matrix $P$, we set $p_{ii} = 0$ and rescale, thereby obtaining a transition matrix $P^*$ modeling a non-lazy random walk. We then create a new transition matrix $Q = (q_{ij})_{i,j=1}^{n}$ by demanding that for fixed $j$ the quantity $q_{ij}$ be proportional to $$ q_{ij} \sim \min((P^*)_{ij}, ((P^*)^2)_{ij}, \dots, ((P^*)^k)_{ij}) \qquad \mbox{where usually}~ k=2.$$ We consider several classical data sets, show that this simple method can increase the efficiency of spectral methods and prove that it can correct randomly introduced errors in the kernel.
△ Less
Submitted 5 November, 2014;
originally announced November 2014.
-
A Remark on Disk Packings and Numerical Integration of Harmonic Functions
Authors:
Stefan Steinerberger
Abstract:
We are interested in the following problem: given an open, bounded domain $Ω\subset \mathbb{R}^2$, what is the largest constant $α= α(Ω) > 0$ such that there exist an infinite sequence of disks $B_1, B_2, \dots, B_N, \dots \subset \mathbb{R}^2$ and a sequence $(n_i)$ with $n_i \in \left\{1,2\right\}$ such that…
▽ More
We are interested in the following problem: given an open, bounded domain $Ω\subset \mathbb{R}^2$, what is the largest constant $α= α(Ω) > 0$ such that there exist an infinite sequence of disks $B_1, B_2, \dots, B_N, \dots \subset \mathbb{R}^2$ and a sequence $(n_i)$ with $n_i \in \left\{1,2\right\}$ such that $$ \sup_{N \in \mathbb{N}}{N^α\left\| χ_Ω - \sum_{i=1}^{N}{(-1)^{n_i}χ_{B_i}}\right\|_{L^1(\mathbb{R}^2)}} < \infty,$$ where $χ$ denotes the characteristic function? We prove that certain (somewhat peculiar) domains $Ω\subset \mathbb{R}^2$ satisfy the property with $α= 0.53$. For these domains there exists a sequence of points $(x_i)_{i=1}^{\infty}$ in $Ω$ with weights $(a_i)_{i=1}^{\infty}$ such that for all harmonic functions $u:\mathbb{R}^2 \rightarrow \mathbb{R}$ $$ \left|\int_Ω{u(x)dx} - \sum_{i=1}^{N}{a_i u(x_i)}\right| \leq C_Ω\frac{\|u\|_{L^{\infty}(Ω)}}{N^{0.53}},$$ where $C_Ω$ depends only on $Ω$. This gives a Quasi-Monte-Carlo method for harmonic functions which improves on the probabilistic Monte-Carlo bound $\|u\|_{L^{2}(Ω)}/N^{0.5}$ \textit{without} introducing a dependence on the total variation. We do not know which decay rates are optimal.
△ Less
Submitted 7 December, 2014; v1 submitted 31 March, 2014;
originally announced March 2014.