Search | arXiv e-print repository

Universal Approximation of Mean-Field Models via Transformers

Authors: Shiba Biswal, Karthik Elamvazhuthi, Rishi Sonthalia

Abstract: This paper investigates the use of transformers to approximate the mean-field dynamics of interacting particle systems exhibiting collective behavior. Such systems are fundamental in modeling phenomena across physics, biology, and engineering, including opinion formation, biological networks, and swarm robotics. The key characteristic of these systems is that the particles are indistinguishable, l… ▽ More This paper investigates the use of transformers to approximate the mean-field dynamics of interacting particle systems exhibiting collective behavior. Such systems are fundamental in modeling phenomena across physics, biology, and engineering, including opinion formation, biological networks, and swarm robotics. The key characteristic of these systems is that the particles are indistinguishable, leading to permutation-equivariant dynamics. First, we empirically demonstrate that transformers are well-suited for approximating a variety of mean field models, including the Cucker-Smale model for flocking and milling, and the mean-field system for training two-layer neural networks. We validate our numerical experiments via mathematical theory. Specifically, we prove that if a finite-dimensional transformer effectively approximates the finite-dimensional vector field governing the particle system, then the $L_2$ distance between the \textit{expected transformer} and the infinite-dimensional mean-field vector field can be uniformly bounded by a function of the number of particles observed during training. Leveraging this result, we establish theoretical bounds on the distance between the true mean-field dynamics and those obtained using the transformer. △ Less

Submitted 27 May, 2025; v1 submitted 6 October, 2024; originally announced October 2024.

arXiv:2410.13991 [pdf, other]

Generalization for Least Squares Regression With Simple Spiked Covariances

Authors: Jiping Li, Rishi Sonthalia

Abstract: Random matrix theory has proven to be a valuable tool in analyzing the generalization of linear models. However, the generalization properties of even two-layer neural networks trained by gradient descent remain poorly understood. To understand the generalization performance of such networks, it is crucial to characterize the spectrum of the feature matrix at the hidden layer. Recent work has made… ▽ More Random matrix theory has proven to be a valuable tool in analyzing the generalization of linear models. However, the generalization properties of even two-layer neural networks trained by gradient descent remain poorly understood. To understand the generalization performance of such networks, it is crucial to characterize the spectrum of the feature matrix at the hidden layer. Recent work has made progress in this direction by describing the spectrum after a single gradient step, revealing a spiked covariance structure. Yet, the generalization error for linear models with spiked covariances has not been previously determined. This paper addresses this gap by examining two simple models exhibiting spiked covariances. We derive their generalization error in the asymptotic proportional regime. Our analysis demonstrates that the eigenvector and eigenvalue corresponding to the spike significantly influence the generalization error. △ Less

Submitted 17 October, 2024; originally announced October 2024.

arXiv:2406.04425 [pdf, other]

On Regularization via Early Stopping for Least Squares Regression

Authors: Rishi Sonthalia, Jackie Lok, Elizaveta Rebrova

Abstract: A fundamental problem in machine learning is understanding the effect of early stopping on the parameters obtained and the generalization capabilities of the model. Even for linear models, the effect is not fully understood for arbitrary learning rates and data. In this paper, we analyze the dynamics of discrete full batch gradient descent for linear regression. With minimal assumptions, we charac… ▽ More A fundamental problem in machine learning is understanding the effect of early stopping on the parameters obtained and the generalization capabilities of the model. Even for linear models, the effect is not fully understood for arbitrary learning rates and data. In this paper, we analyze the dynamics of discrete full batch gradient descent for linear regression. With minimal assumptions, we characterize the trajectory of the parameters and the expected excess risk. Using this characterization, we show that when training with a learning rate schedule $η_k$, and a finite time horizon $T$, the early stopped solution $β_T$ is equivalent to the minimum norm solution for a generalized ridge regularized problem. We also prove that early stopping is beneficial for generic data with arbitrary spectrum and for a wide variety of learning rate schedules. We provide an estimate for the optimal stopping time and empirically demonstrate the accuracy of our estimate. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03696 [pdf, other]

Error dynamics of mini-batch gradient descent with random reshuffling for least squares regression

Authors: Jackie Lok, Rishi Sonthalia, Elizaveta Rebrova

Abstract: We study the discrete dynamics of mini-batch gradient descent with random reshuffling for least squares regression. We show that the training and generalization errors depend on a sample cross-covariance matrix $Z$ between the original features $X$ and a set of new features $\widetilde{X}$ in which each feature is modified by the mini-batches that appear before it during the learning process in an… ▽ More We study the discrete dynamics of mini-batch gradient descent with random reshuffling for least squares regression. We show that the training and generalization errors depend on a sample cross-covariance matrix $Z$ between the original features $X$ and a set of new features $\widetilde{X}$ in which each feature is modified by the mini-batches that appear before it during the learning process in an averaged way. Using this representation, we establish that the dynamics of mini-batch and full-batch gradient descent agree up to leading order with respect to the step size using the linear scaling rule. However, mini-batch gradient descent with random reshuffling exhibits a subtle dependence on the step size that a gradient flow analysis cannot detect, such as converging to a limit that depends on the step size. By comparing $Z$, a non-commutative polynomial of random matrices, with the sample covariance matrix of $X$ asymptotically, we demonstrate that batching affects the dynamics by resulting in a form of shrinkage on the spectrum. △ Less

Submitted 3 February, 2025; v1 submitted 5 June, 2024; originally announced June 2024.

Comments: 33 pages. Accepted at ALT 2025

arXiv:2403.07264 [pdf, other]

Near-Interpolators: Rapid Norm Growth and the Trade-Off between Interpolation and Generalization

Authors: Yutong Wang, Rishi Sonthalia, Wei Hu

Abstract: We study the generalization capability of nearly-interpolating linear regressors: $\boldsymbolβ$'s whose training error $τ$ is positive but small, i.e., below the noise floor. Under a random matrix theoretic assumption on the data distribution and an eigendecay assumption on the data covariance matrix $\boldsymbolΣ$, we demonstrate that any near-interpolator exhibits rapid norm growth: for $τ$ fix… ▽ More We study the generalization capability of nearly-interpolating linear regressors: $\boldsymbolβ$'s whose training error $τ$ is positive but small, i.e., below the noise floor. Under a random matrix theoretic assumption on the data distribution and an eigendecay assumption on the data covariance matrix $\boldsymbolΣ$, we demonstrate that any near-interpolator exhibits rapid norm growth: for $τ$ fixed, $\boldsymbolβ$ has squared $\ell_2$-norm $\mathbb{E}[\|{\boldsymbolβ}\|_{2}^{2}] = Ω(n^α)$ where $n$ is the number of samples and $α>1$ is the exponent of the eigendecay, i.e., $λ_i(\boldsymbolΣ) \sim i^{-α}$. This implies that existing data-independent norm-based bounds are necessarily loose. On the other hand, in the same regime we precisely characterize the asymptotic trade-off between interpolation and generalization. Our characterization reveals that larger norm scaling exponents $α$ correspond to worse trade-offs between interpolation and generalization. We verify empirically that a similar phenomenon holds for nearly-interpolating shallow neural networks. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: AISTATS 2024

arXiv:2310.00729 [pdf, other]

Spectral Neural Networks: Approximation Theory and Optimization Landscape

Authors: Chenghui Li, Rishi Sonthalia, Nicolas Garcia Trillos

Abstract: There is a large variety of machine learning methodologies that are based on the extraction of spectral geometric information from data. However, the implementations of many of these methods often depend on traditional eigensolvers, which present limitations when applied in practical online big data scenarios. To address some of these challenges, researchers have proposed different strategies for… ▽ More There is a large variety of machine learning methodologies that are based on the extraction of spectral geometric information from data. However, the implementations of many of these methods often depend on traditional eigensolvers, which present limitations when applied in practical online big data scenarios. To address some of these challenges, researchers have proposed different strategies for training neural networks as alternatives to traditional eigensolvers, with one such approach known as Spectral Neural Network (SNN). In this paper, we investigate key theoretical aspects of SNN. First, we present quantitative insights into the tradeoff between the number of neurons and the amount of spectral geometric information a neural network learns. Second, we initiate a theoretical exploration of the optimization landscape of SNN's objective to shed light on the training dynamics of SNN. Unlike typical studies of convergence to global solutions of NN training dynamics, SNN presents an additional complexity due to its non-convex ambient loss function. △ Less

Submitted 1 October, 2023; originally announced October 2023.

arXiv:2305.17297 [pdf, other]

Double Descent and Overfitting under Noisy Inputs and Distribution Shift for Linear Denoisers

Authors: Chinmaya Kausik, Kashvi Srivastava, Rishi Sonthalia

Abstract: Despite the importance of denoising in modern machine learning and ample empirical work on supervised denoising, its theoretical understanding is still relatively scarce. One concern about studying supervised denoising is that one might not always have noiseless training data from the test distribution. It is more reasonable to have access to noiseless training data from a different dataset than t… ▽ More Despite the importance of denoising in modern machine learning and ample empirical work on supervised denoising, its theoretical understanding is still relatively scarce. One concern about studying supervised denoising is that one might not always have noiseless training data from the test distribution. It is more reasonable to have access to noiseless training data from a different dataset than the test dataset. Motivated by this, we study supervised denoising and noisy-input regression under distribution shift. We add three considerations to increase the applicability of our theoretical insights to real-life data and modern machine learning. First, while most past theoretical work assumes that the data covariance matrix is full-rank and well-conditioned, empirical studies have shown that real-life data is approximately low-rank. Thus, we assume that our data matrices are low-rank. Second, we drop independence assumptions on our data. Third, the rise in computational power and dimensionality of data have made it important to study non-classical regimes of learning. Thus, we work in the non-classical proportional regime, where data dimension $d$ and number of samples $N$ grow as $d/N = c + o(1)$. For this setting, we derive data-dependent, instance specific expressions for the test error for both denoising and noisy-input regression, and study when overfitting the noise is benign, tempered or catastrophic. We show that the test error exhibits double descent under general distribution shift, providing insights for data augmentation and the role of noise as an implicit regularizer. We also perform experiments using real-life data, where we match the theoretical predictions with under 1\% MSE error for low-rank data. △ Less

Submitted 14 March, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: Complete overhaul of presentation, many new results

arXiv:2305.14689 [pdf, other]

Least Squares Regression Can Exhibit Under-Parameterized Double Descent

Authors: Xinyue Li, Rishi Sonthalia

Abstract: The relationship between the number of training data points, the number of parameters, and the generalization capabilities of models has been widely studied. Previous work has shown that double descent can occur in the over-parameterized regime and that the standard bias-variance trade-off holds in the under-parameterized regime. These works provide multiple reasons for the existence of the peak.… ▽ More The relationship between the number of training data points, the number of parameters, and the generalization capabilities of models has been widely studied. Previous work has shown that double descent can occur in the over-parameterized regime and that the standard bias-variance trade-off holds in the under-parameterized regime. These works provide multiple reasons for the existence of the peak. We postulate that the location of the peak depends on the technical properties of both the spectrum as well as the eigenvectors of the sample covariance. We present two simple examples that provably exhibit double descent in the under-parameterized regime and do not seem to occur for reasons provided in prior work. △ Less

Submitted 24 October, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

arXiv:2305.14632 [pdf, other]

Supermodular Rank: Set Function Decomposition and Optimization

Authors: Rishi Sonthalia, Anna Seigal, Guido Montufar

Abstract: We define the supermodular rank of a function on a lattice. This is the smallest number of terms needed to decompose it into a sum of supermodular functions. The supermodular summands are defined with respect to different partial orders. We characterize the maximum possible value of the supermodular rank and describe the functions with fixed supermodular rank. We analogously define the submodular… ▽ More We define the supermodular rank of a function on a lattice. This is the smallest number of terms needed to decompose it into a sum of supermodular functions. The supermodular summands are defined with respect to different partial orders. We characterize the maximum possible value of the supermodular rank and describe the functions with fixed supermodular rank. We analogously define the submodular rank. We use submodular decompositions to optimize set functions. Given a bound on the submodular rank of a set function, we formulate an algorithm that splits an optimization problem into submodular subproblems. We show that this method improves the approximation ratio guarantees of several algorithms for monotone set function maximization and ratio of set functions minimization, at a computation overhead that depends on the submodular rank. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2210.00881 [pdf, other]

doi 10.1038/s42256-023-00735-0

Predicting the Future of AI with AI: High-quality link prediction in an exponentially growing knowledge network

Authors: Mario Krenn, Lorenzo Buffoni, Bruno Coutinho, Sagi Eppel, Jacob Gates Foster, Andrew Gritsevskiy, Harlin Lee, Yichao Lu, Joao P. Moutinho, Nima Sanjabi, Rishi Sonthalia, Ngoc Mai Tran, Francisco Valente, Yangxinyu Xie, Rose Yu, Michael Kopp

Abstract: A tool that could suggest new personalized research directions and ideas by taking insights from the scientific literature could significantly accelerate the progress of science. A field that might benefit from such an approach is artificial intelligence (AI) research, where the number of scientific publications has been growing exponentially over the last years, making it challenging for human re… ▽ More A tool that could suggest new personalized research directions and ideas by taking insights from the scientific literature could significantly accelerate the progress of science. A field that might benefit from such an approach is artificial intelligence (AI) research, where the number of scientific publications has been growing exponentially over the last years, making it challenging for human researchers to keep track of the progress. Here, we use AI techniques to predict the future research directions of AI itself. We develop a new graph-based benchmark based on real-world data -- the Science4Cast benchmark, which aims to predict the future state of an evolving semantic network of AI. For that, we use more than 100,000 research papers and build up a knowledge network with more than 64,000 concept nodes. We then present ten diverse methods to tackle this task, ranging from pure statistical to pure learning methods. Surprisingly, the most powerful methods use a carefully curated set of network features, rather than an end-to-end AI approach. It indicates a great potential that can be unleashed for purely ML approaches without human knowledge. Ultimately, better predictions of new future research directions will be a crucial component of more advanced research suggestion tools. △ Less

Submitted 23 September, 2022; originally announced October 2022.

Comments: 13 pages, 7 figures. Comments welcome!

Journal ref: Nature Machine Intelligence 5, 1326 (2023)

arXiv:2206.09048 [pdf, ps, other]

doi 10.5281/zenodo.6554616

ICLR 2022 Challenge for Computational Geometry and Topology: Design and Results

Authors: Adele Myers, Saiteja Utpala, Shubham Talbar, Sophia Sanborn, Christian Shewmake, Claire Donnat, Johan Mathe, Umberto Lupo, Rishi Sonthalia, Xinyue Cui, Tom Szwagier, Arthur Pignet, Andri Bergsson, Soren Hauberg, Dmitriy Nielsen, Stefan Sommer, David Klindt, Erik Hermansen, Melvin Vaupel, Benjamin Dunn, Jeffrey Xiong, Noga Aharony, Itsik Pe'er, Felix Ambellan, Martin Hanik , et al. (3 additional authors not shown)

Abstract: This paper presents the computational challenge on differential geometry and topology that was hosted within the ICLR 2022 workshop ``Geometric and Topological Representation Learning". The competition asked participants to provide implementations of machine learning algorithms on manifolds that would respect the API of the open-source software Geomstats (manifold part) and Scikit-Learn (machine l… ▽ More This paper presents the computational challenge on differential geometry and topology that was hosted within the ICLR 2022 workshop ``Geometric and Topological Representation Learning". The competition asked participants to provide implementations of machine learning algorithms on manifolds that would respect the API of the open-source software Geomstats (manifold part) and Scikit-Learn (machine learning part) or PyTorch. The challenge attracted seven teams in its two month duration. This paper describes the design of the challenge and summarizes its main findings. △ Less

Submitted 26 June, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

arXiv:2110.11430 [pdf, other]

How can classical multidimensional scaling go wrong?

Authors: Rishi Sonthalia, Gregory Van Buskirk, Benjamin Raichel, Anna C. Gilbert

Abstract: Given a matrix $D$ describing the pairwise dissimilarities of a data set, a common task is to embed the data points into Euclidean space. The classical multidimensional scaling (cMDS) algorithm is a widespread method to do this. However, theoretical analysis of the robustness of the algorithm and an in-depth analysis of its performance on non-Euclidean metrics is lacking. In this paper, we deriv… ▽ More Given a matrix $D$ describing the pairwise dissimilarities of a data set, a common task is to embed the data points into Euclidean space. The classical multidimensional scaling (cMDS) algorithm is a widespread method to do this. However, theoretical analysis of the robustness of the algorithm and an in-depth analysis of its performance on non-Euclidean metrics is lacking. In this paper, we derive a formula, based on the eigenvalues of a matrix obtained from $D$, for the Frobenius norm of the difference between $D$ and the metric $D_{\text{cmds}}$ returned by cMDS. This error analysis leads us to the conclusion that when the derived matrix has a significant number of negative eigenvalues, then $\|D-D_{\text{cmds}}\|_F$, after initially decreasing, will eventually increase as we increase the dimension. Hence, counterintuitively, the quality of the embedding degrades as we increase the dimension. We empirically verify that the Frobenius norm increases as we increase the dimension for a variety of non-Euclidean metrics. We also show on several benchmark datasets that this degradation in the embedding results in the classification accuracy of both simple (e.g., 1-nearest neighbor) and complex (e.g., multi-layer neural nets) classifiers decreasing as we increase the embedding dimension. Finally, our analysis leads us to a new efficiently computable algorithm that returns a matrix $D_l$ that is at least as close to the original distances as $D_t$ (the Euclidean metric closest in $\ell_2$ distance). While $D_l$ is not metric, when given as input to cMDS instead of $D$, it empirically results in solutions whose distance to $D$ does not increase when we increase the dimension and the classification accuracy degrades less than the cMDS solution. △ Less

Submitted 28 October, 2021; v1 submitted 21 October, 2021; originally announced October 2021.

Comments: Accepted to NeurIPS 2021

arXiv:2110.04932 [pdf, other]

An Analysis of COVID-19 Knowledge Graph Construction and Applications

Authors: Dominic Flocco, Bryce Palmer-Toy, Ruixiao Wang, Hongyu Zhu, Rishi Sonthalia, Junyuan Lin, Andrea L. Bertozzi, P. Jeffrey Brantingham

Abstract: The construction and application of knowledge graphs have seen a rapid increase across many disciplines in recent years. Additionally, the problem of uncovering relationships between developments in the COVID-19 pandemic and social media behavior is of great interest to researchers hoping to curb the spread of the disease. In this paper we present a knowledge graph constructed from COVID-19 relate… ▽ More The construction and application of knowledge graphs have seen a rapid increase across many disciplines in recent years. Additionally, the problem of uncovering relationships between developments in the COVID-19 pandemic and social media behavior is of great interest to researchers hoping to curb the spread of the disease. In this paper we present a knowledge graph constructed from COVID-19 related tweets in the Los Angeles area, supplemented with federal and state policy announcements and disease spread statistics. By incorporating dates, topics, and events as entities, we construct a knowledge graph that describes the connections between these useful information. We use natural language processing and change point analysis to extract tweet-topic, tweet-date, and event-date relations. Further analysis on the constructed knowledge graph provides insight into how tweets reflect public sentiments towards COVID-19 related topics and how changes in these sentiments correlate with real-world events. △ Less

Submitted 10 October, 2021; originally announced October 2021.

arXiv:2012.03126 [pdf, other]

Dual Regularized Optimal Transport

Authors: Rishi Sonthalia, Anna C. Gilbert

Abstract: In this paper, we present a new formulation of unbalanced optimal transport called Dual Regularized Optimal Transport (DROT). We argue that regularizing the dual formulation of optimal transport results in a version of unbalanced optimal transport that leads to sparse solutions and that gives us control over mass creation and destruction. We build intuition behind such control and present theoreti… ▽ More In this paper, we present a new formulation of unbalanced optimal transport called Dual Regularized Optimal Transport (DROT). We argue that regularizing the dual formulation of optimal transport results in a version of unbalanced optimal transport that leads to sparse solutions and that gives us control over mass creation and destruction. We build intuition behind such control and present theoretical properties of the solutions to DROT. We demonstrate that due to recent advances in optimization techniques, we can feasibly solve such a formulation at large scales and present extensive experimental evidence for this formulation and its solution. △ Less

Submitted 5 December, 2020; originally announced December 2020.

arXiv:2005.03853 [pdf, other]

Project and Forget: Solving Large-Scale Metric Constrained Problems

Authors: Rishi Sonthalia, Anna C. Gilbert

Abstract: Given a set of dissimilarity measurements amongst data points, determining what metric representation is most "consistent" with the input measurements or the metric that best captures the relevant geometric features of the data is a key step in many machine learning algorithms. Existing methods are restricted to specific kinds of metrics or small problem sizes because of the large number of metric… ▽ More Given a set of dissimilarity measurements amongst data points, determining what metric representation is most "consistent" with the input measurements or the metric that best captures the relevant geometric features of the data is a key step in many machine learning algorithms. Existing methods are restricted to specific kinds of metrics or small problem sizes because of the large number of metric constraints in such problems. In this paper, we provide an active set algorithm, Project and Forget, that uses Bregman projections, to solve metric constrained problems with many (possibly exponentially) inequality constraints. We provide a theoretical analysis of \textsc{Project and Forget} and prove that our algorithm converges to the global optimal solution and that the $L_2$ distance of the current iterate to the optimal solution decays asymptotically at an exponential rate. We demonstrate that using our method we can solve large problem instances of three types of metric constrained problems: general weight correlation clustering, metric nearness, and metric learning; in each case, out-performing the state of the art methods with respect to CPU times and problem sizes. △ Less

Submitted 26 September, 2022; v1 submitted 8 May, 2020; originally announced May 2020.

arXiv:2005.03847 [pdf, other]

Tree! I am no Tree! I am a Low Dimensional Hyperbolic Embedding

Authors: Rishi Sonthalia, Anna C. Gilbert

Abstract: Given data, finding a faithful low-dimensional hyperbolic embedding of the data is a key method by which we can extract hierarchical information or learn representative geometric features of the data. In this paper, we explore a new method for learning hyperbolic representations by taking a metric-first approach. Rather than determining the low-dimensional hyperbolic embedding directly, we learn a… ▽ More Given data, finding a faithful low-dimensional hyperbolic embedding of the data is a key method by which we can extract hierarchical information or learn representative geometric features of the data. In this paper, we explore a new method for learning hyperbolic representations by taking a metric-first approach. Rather than determining the low-dimensional hyperbolic embedding directly, we learn a tree structure on the data. This tree structure can then be used directly to extract hierarchical information, embedded into a hyperbolic manifold using Sarkar's construction \cite{sarkar}, or used as a tree approximation of the original metric. To this end, we present a novel fast algorithm \textsc{TreeRep} such that, given a $δ$-hyperbolic metric (for any $δ\geq 0$), the algorithm learns a tree structure that approximates the original metric. In the case when $δ= 0$, we show analytically that \textsc{TreeRep} exactly recovers the original tree structure. We show empirically that \textsc{TreeRep} is not only many orders of magnitude faster than previously known algorithms, but also produces metrics with lower average distortion and higher mean average precision than most previous algorithms for learning hyperbolic embeddings, extracting hierarchical information, and approximating metrics via tree metrics. △ Less

Submitted 22 October, 2020; v1 submitted 8 May, 2020; originally announced May 2020.

Comments: Code available at https://github.com/rsonthal/TreeRep

arXiv:1908.08411 [pdf, other]

Generalized Metric Repair on Graphs

Authors: Chenglin Fan, Anna C. Gilbert, Benjamin Raichel, Rishi Sonthalia, Gregory Van Buskirk

Abstract: Many modern data analysis algorithms either assume or are considerably more efficient if the distances between the data points satisfy a metric. These algorithms include metric learning, clustering, and dimension reduction. As real data sets are noisy, distances often fail to satisfy a metric. For this reason, Gilbert and Jain and Fan et al. introduced the closely related sparse metric repair and… ▽ More Many modern data analysis algorithms either assume or are considerably more efficient if the distances between the data points satisfy a metric. These algorithms include metric learning, clustering, and dimension reduction. As real data sets are noisy, distances often fail to satisfy a metric. For this reason, Gilbert and Jain and Fan et al. introduced the closely related sparse metric repair and metric violation distance problems. The goal of these problems is to repair as few distances as possible to ensure they satisfy a metric. Three variants were considered, one admitting a polynomial time algorithm. The other variants were shown to be APX-hard, and an $O(OPT^{1/3})$-approximation was given, where $OPT$ is the optimal solution size. In this paper, we generalize these problems to no longer consider all distances between the data points. That is, we consider a weighted graph $G$ with corrupted weights $w$, and our goal is to find the smallest number of weight modifications so that the resulting weighted graph distances satisfy a metric. This is a natural generalization and is more flexible as it takes into account different relationships among the data points. As in previous work, we distinguish among the types of repairs permitted and focus on the increase only and general versions. We demonstrate the inherent combinatorial structure of the problem, and give an approximation-preserving reduction from MULTICUT. Conversely, we show that for any fixed constant $ς$, for the large class of $ς$-chordal graphs, the problems are fixed parameter tractable. Call a cycle broken if it contains an edge whose weight is larger than the sum of all its other edges, and call the amount of this difference its deficit. We present approximation algorithms, one which depends on the maximum number of edges in a broken cycle, and one which depends on the number of distinct deficit values. △ Less

Submitted 21 August, 2019; originally announced August 2019.

Comments: arXiv admin note: text overlap with arXiv:1807.08078

arXiv:1807.07619 [pdf, other]

Generalized Metric Repair on Graphs

Authors: Anna C. Gilbert, Rishi Sonthalia

Abstract: Many modern data analysis algorithms either assume that or are considerably more efficient if the distances between the data points satisfy a metric. These algorithms include metric learning, clustering, and dimensionality reduction. Because real data sets are noisy, the similarity measures often fail to satisfy a metric. For this reason, Gilbert and Jain [11] and Fan, et al. [8] introduce the clo… ▽ More Many modern data analysis algorithms either assume that or are considerably more efficient if the distances between the data points satisfy a metric. These algorithms include metric learning, clustering, and dimensionality reduction. Because real data sets are noisy, the similarity measures often fail to satisfy a metric. For this reason, Gilbert and Jain [11] and Fan, et al. [8] introduce the closely related problems of $\textit{sparse metric repair}$ and $\textit{metric violation distance}$. The goal of each problem is to repair as few distances as possible to ensure that the distances between the data points satisfy a metric. We generalize these problems so as to no longer require all the distances between the data points. That is, we consider a weighted graph $G$ with corrupted weights w and our goal is to find the smallest number of modifications to the weights so that the resulting weighted graph distances satisfy a metric. This problem is a natural generalization of the sparse metric repair problem and is more flexible as it takes into account different relationships amongst the input data points. As in previous work, we distinguish amongst the types of repairs permitted (decrease, increase, and general repairs). We focus on the increase and general versions and establish hardness results and show the inherent combinatorial structure of the problem. We then show that if we restrict to the case when $G$ is a chordal graph, then the problem is fixed parameter tractable. We also present several classes of approximation algorithms. These include and improve upon previous metric repair algorithms for the special case when $G = K_n$ △ Less

Submitted 19 July, 2018; originally announced July 2018.

arXiv:1807.07610 [pdf, other]

doi 10.1109/ALLERTON.2018.8635955

Unsupervised Metric Learning in Presence of Missing Data

Authors: Anna C. Gilbert, Rishi Sonthalia

Abstract: For many machine learning tasks, the input data lie on a low-dimensional manifold embedded in a high dimensional space and, because of this high-dimensional structure, most algorithms are inefficient. The typical solution is to reduce the dimension of the input data using standard dimension reduction algorithms such as ISOMAP, LAPLACIAN EIGENMAPS or LLES. This approach, however, does not always wo… ▽ More For many machine learning tasks, the input data lie on a low-dimensional manifold embedded in a high dimensional space and, because of this high-dimensional structure, most algorithms are inefficient. The typical solution is to reduce the dimension of the input data using standard dimension reduction algorithms such as ISOMAP, LAPLACIAN EIGENMAPS or LLES. This approach, however, does not always work in practice as these algorithms require that we have somewhat ideal data. Unfortunately, most data sets either have missing entries or unacceptably noisy values. That is, real data are far from ideal and we cannot use these algorithms directly. In this paper, we focus on the case when we have missing data. Some techniques, such as matrix completion, can be used to fill in missing data but these methods do not capture the non-linear structure of the manifold. Here, we present a new algorithm MR-MISSING that extends these previous algorithms and can be used to compute low dimensional representation on data sets with missing entries. We demonstrate the effectiveness of our algorithm by running three different experiments. We visually verify the effectiveness of our algorithm on synthetic manifolds, we numerically compare our projections against those computed by first filling in data using nlPCA and mDRUR on the MNIST data set, and we also show that we can do classification on MNIST with missing data. We also provide a theoretical guarantee for MR-MISSING under some simplifying assumptions. △ Less

Submitted 3 March, 2019; v1 submitted 19 July, 2018; originally announced July 2018.

Journal ref: 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton)

Showing 1–19 of 19 results for author: Sonthalia, R