-
LUMÁWIG: An Efficient Algorithm for Dimension Zero Bottleneck Distance Computation in Topological Data Analysis
Authors:
Paul Samuel Ignacio,
Jay-Anne Bulauan,
David Uminsky
Abstract:
Stability of persistence diagrams under slight perturbations is a key characteristic behind the validity and growing popularity of topological data analysis in exploring real-world data. Central to this stability is the use of Bottleneck distance which entails matching points between diagrams. Use of this metric in practical studies has, however, been few and sparingly because of the computational…
▽ More
Stability of persistence diagrams under slight perturbations is a key characteristic behind the validity and growing popularity of topological data analysis in exploring real-world data. Central to this stability is the use of Bottleneck distance which entails matching points between diagrams. Use of this metric in practical studies has, however, been few and sparingly because of the computational obstruction, especially in dimension zero where the computational cost explodes with the growth of data size. We present LUMÁWIG, a novel efficient algorithm to compute dimension zero bottleneck distance between two persistent diagrams which runs significantly faster and provides significantly sharper approximates with respect to the output of the original algorithm than any other available algorithm. We bypass the overwhelming matching problem in previous implementations of the bottleneck distance, and prove that the zero dimensional bottleneck distance can be recovered from a very small number of matching cases. We show that LUMÁWIG generally enjoys linear complexity as shown by empirical tests. We also present an application that leverages dimension zero persistence diagrams and the bottleneck distance to produce features for classification tasks.
△ Less
Submitted 9 October, 2020; v1 submitted 1 October, 2020;
originally announced October 2020.
-
The Problem with Metrics is a Fundamental Problem for AI
Authors:
Rachel Thomas,
David Uminsky
Abstract:
Optimizing a given metric is a central aspect of most current AI approaches, yet overemphasizing metrics leads to manipulation, gaming, a myopic focus on short-term goals, and other unexpected negative consequences. This poses a fundamental contradiction for AI development. Through a series of real-world case studies, we look at various aspects of where metrics go wrong in practice and aspects of…
▽ More
Optimizing a given metric is a central aspect of most current AI approaches, yet overemphasizing metrics leads to manipulation, gaming, a myopic focus on short-term goals, and other unexpected negative consequences. This poses a fundamental contradiction for AI development. Through a series of real-world case studies, we look at various aspects of where metrics go wrong in practice and aspects of how our online environment and current business practices are exacerbating these failures. Finally, we propose a framework towards mitigating the harms caused by overemphasis of metrics within AI by: (1) using a slate of metrics to get a fuller and more nuanced picture, (2) combining metrics with qualitative accounts, and (3) involving a range of stakeholders, including those who will be most impacted.
△ Less
Submitted 19 February, 2020;
originally announced February 2020.
-
Classification of Single-lead Electrocardiograms: TDA Informed Machine Learning
Authors:
Paul Samuel Ignacio,
David Uminsky,
Christopher Dunstan,
Esteban Escobar,
Luke Trujillo
Abstract:
Atrial Fibrillation is a heart condition characterized by erratic heart rhythms caused by chaotic propagation of electrical impulses in the atria, leading to numerous health complications. State-of-the-art models employ complex algorithms that extract expert-informed features to improve diagnosis. In this note, we demonstrate how topological features can be used to help accurately classify single…
▽ More
Atrial Fibrillation is a heart condition characterized by erratic heart rhythms caused by chaotic propagation of electrical impulses in the atria, leading to numerous health complications. State-of-the-art models employ complex algorithms that extract expert-informed features to improve diagnosis. In this note, we demonstrate how topological features can be used to help accurately classify single lead electrocardiograms. Via delay embeddings, we map electrocardiograms onto high-dimensional point-clouds that convert periodic signals to algebraically computable topological signatures. We derive features from persistent signatures, input them to a simple machine learning algorithm, and benchmark its performance against winning entries in the 2017 Physionet Computing in Cardiology Challenge.
△ Less
Submitted 27 November, 2019; v1 submitted 25 November, 2019;
originally announced November 2019.
-
A Non-iterative Parallelizable Eigenbasis Algorithm for Johnson Graphs
Authors:
Jackson Abascal,
Amadou Bah,
Mario Banuelos,
David Uminsky,
Olivia Vasquez
Abstract:
We present a new $O(k^2 \binom{n}{k}^2)$ method for generating an orthogonal basis of eigenvectors for the Johnson graph $J(n,k)$. Unlike standard methods for computing a full eigenbasis of sparse symmetric matrices, the algorithm presented here is non-iterative, and produces exact results under an infinite-precision computation model. In addition, our method is highly parallelizable; given access…
▽ More
We present a new $O(k^2 \binom{n}{k}^2)$ method for generating an orthogonal basis of eigenvectors for the Johnson graph $J(n,k)$. Unlike standard methods for computing a full eigenbasis of sparse symmetric matrices, the algorithm presented here is non-iterative, and produces exact results under an infinite-precision computation model. In addition, our method is highly parallelizable; given access to unlimited parallel processors, the eigenbasis can be constructed in only $O(n)$ time given n and k. We also present an algorithm for computing projections onto the eigenspaces of $J(n,k)$ in parallel time $O(n)$.
△ Less
Submitted 11 December, 2018;
originally announced December 2018.
-
The power of A/B testing under interference
Authors:
James D. Wilson,
David T. Uminsky
Abstract:
In this paper, we address the fundamental statistical question: how can you assess the power of an A/B test when the units in the study are exposed to interference? This question is germane to many scientific and industrial practitioners that rely on A/B testing in environments where control over interference is limited. We begin by proving that interference has a measurable effect on its sensitiv…
▽ More
In this paper, we address the fundamental statistical question: how can you assess the power of an A/B test when the units in the study are exposed to interference? This question is germane to many scientific and industrial practitioners that rely on A/B testing in environments where control over interference is limited. We begin by proving that interference has a measurable effect on its sensitivity, or power. We quantify the power of an A/B test of equality of means as a function of the number of exposed individuals under any interference mechanism. We further derive a central limit theorem for the number of exposed individuals under a simple Bernoulli switching interference mechanism. Based on these results, we develop a strategy to estimate the power of an A/B test when actors experience interference according to an observed network model. We demonstrate how to leverage this theory to estimate the power of an A/B test on units sharing any network relationship, and highlight the utility of our method on two applications - a Facebook friendship network as well as a large Twitter follower network. These results yield, for the first time, the capacity to understand how to design an A/B test to detect, with a specified confidence, a fixed measurable treatment effect when the A/B test is conducted under interference driven by networks.
△ Less
Submitted 10 October, 2017;
originally announced October 2017.
-
Multiclass Total Variation Clustering
Authors:
Xavier Bresson,
Thomas Laurent,
David Uminsky,
James H. von Brecht
Abstract:
Ideas from the image processing literature have recently motivated a new set of clustering algorithms that rely on the concept of total variation. While these algorithms perform well for bi-partitioning tasks, their recursive extensions yield unimpressive results for multiclass clustering tasks. This paper presents a general framework for multiclass total variation clustering that does not rely on…
▽ More
Ideas from the image processing literature have recently motivated a new set of clustering algorithms that rely on the concept of total variation. While these algorithms perform well for bi-partitioning tasks, their recursive extensions yield unimpressive results for multiclass clustering tasks. This paper presents a general framework for multiclass total variation clustering that does not rely on recursion. The results greatly outperform previous total variation algorithms and compare well with state-of-the-art NMF approaches.
△ Less
Submitted 5 June, 2013;
originally announced June 2013.