-
Bregman-Hausdorff divergence: strengthening the connections between computational geometry and machine learning
Authors:
Tuyen Pham,
Hana Dal Poz Kouřimská,
Hubert Wagner
Abstract:
The purpose of this paper is twofold. On a technical side, we propose an extension of the Hausdorff distance from metric spaces to spaces equipped with asymmetric distance measures. Specifically, we focus on the family of Bregman divergences, which includes the popular Kullback--Leibler divergence (also known as relative entropy).
As a proof of concept, we use the resulting Bregman--Hausdorff di…
▽ More
The purpose of this paper is twofold. On a technical side, we propose an extension of the Hausdorff distance from metric spaces to spaces equipped with asymmetric distance measures. Specifically, we focus on the family of Bregman divergences, which includes the popular Kullback--Leibler divergence (also known as relative entropy).
As a proof of concept, we use the resulting Bregman--Hausdorff divergence to compare two collections of probabilistic predictions produced by different machine learning models trained using the relative entropy loss. The algorithms we propose are surprisingly efficient even for large inputs with hundreds of dimensions.
In addition to the introduction of this technical concept, we provide a survey. It outlines the basics of Bregman geometry, as well as computational geometry algorithms. We focus on algorithms that are compatible with this geometry and are relevant for machine learning.
△ Less
Submitted 9 April, 2025;
originally announced April 2025.
-
Fast Kd-trees for the Kullback--Leibler Divergence and other Decomposable Bregman Divergences
Authors:
Tuyen Pham,
Hubert Wagner
Abstract:
The contributions of the paper span theoretical and implementational results. First, we prove that Kd-trees can be extended to spaces in which the distance is measured with an arbitrary Bregman divergence. Perhaps surprisingly, this shows that the triangle inequality is not necessary for correct pruning in Kd-trees. Second, we offer an efficient algorithm and C++ implementation for nearest neighbo…
▽ More
The contributions of the paper span theoretical and implementational results. First, we prove that Kd-trees can be extended to spaces in which the distance is measured with an arbitrary Bregman divergence. Perhaps surprisingly, this shows that the triangle inequality is not necessary for correct pruning in Kd-trees. Second, we offer an efficient algorithm and C++ implementation for nearest neighbour search for decomposable Bregman divergences.
The implementation supports the Kullback--Leibler divergence (relative entropy) which is a popular distance between probability vectors and is commonly used in statistics and machine learning. This is a step toward broadening the usage of computational geometry algorithms.
Our benchmarks show that our implementation efficiently handles both exact and approximate nearest neighbour queries. Compared to a naive approach, we achieve two orders of magnitude speedup for practical scenarios in dimension up to 100. Our solution is simpler and more efficient than competing methods.
△ Less
Submitted 18 February, 2025;
originally announced February 2025.
-
High Resolution Tree Height Mapping of the Amazon Forest using Planet NICFI Images and LiDAR-Informed U-Net Model
Authors:
Fabien H Wagner,
Ricardo Dalagnol,
Griffin Carter,
Mayumi CM Hirye,
Shivraj Gill,
Le Bienfaiteur Sagang Takougoum,
Samuel Favrichon,
Michael Keller,
Jean PHB Ometto,
Lorena Alves,
Cynthia Creze,
Stephanie P George-Chacon,
Shuang Li,
Zhihua Liu,
Adugna Mullissa,
Yan Yang,
Erone G Santos,
Sarah R Worden,
Martin Brandt,
Philippe Ciais,
Stephen C Hagen,
Sassan Saatchi
Abstract:
Tree canopy height is one of the most important indicators of forest biomass, productivity, and ecosystem structure, but it is challenging to measure accurately from the ground and from space. Here, we used a U-Net model adapted for regression to map the mean tree canopy height in the Amazon forest from Planet NICFI images at ~4.78 m spatial resolution for the period 2020-2024. The U-Net model was…
▽ More
Tree canopy height is one of the most important indicators of forest biomass, productivity, and ecosystem structure, but it is challenging to measure accurately from the ground and from space. Here, we used a U-Net model adapted for regression to map the mean tree canopy height in the Amazon forest from Planet NICFI images at ~4.78 m spatial resolution for the period 2020-2024. The U-Net model was trained using canopy height models computed from aerial LiDAR data as a reference, along with their corresponding Planet NICFI images. Predictions of tree heights on the validation sample exhibited a mean error of 3.68 m and showed relatively low systematic bias across the entire range of tree heights present in the Amazon forest. Our model successfully estimated canopy heights up to 40-50 m without much saturation, outperforming existing canopy height products from global models in this region. We determined that the Amazon forest has an average canopy height of ~22 m. Events such as logging or deforestation could be detected from changes in tree height, and encouraging results were obtained to monitor the height of regenerating forests. These findings demonstrate the potential for large-scale mapping and monitoring of tree height for old and regenerating Amazon forests using Planet NICFI imagery.
△ Less
Submitted 17 January, 2025;
originally announced January 2025.
-
A Systematic Approach to Crossing Numbers of Cartesian Products with Paths
Authors:
Zayed Asiri,
Ryan Burdett,
Markus Chimani,
Michael Haythorpe,
Alex Newcombe,
Mirko H. Wagner
Abstract:
Determining the crossing numbers of Cartesian products of small graphs with arbitrarily large paths has been an ongoing topic of research since the 1970s. Doing so requires the establishment of coincident upper and lower bounds; the former is usually demonstrated by providing a suitable drawing procedure, while the latter often requires substantial theoretical arguments. Many such papers have been…
▽ More
Determining the crossing numbers of Cartesian products of small graphs with arbitrarily large paths has been an ongoing topic of research since the 1970s. Doing so requires the establishment of coincident upper and lower bounds; the former is usually demonstrated by providing a suitable drawing procedure, while the latter often requires substantial theoretical arguments. Many such papers have been published, which typically focus on just one or two small graphs at a time, and use ad hoc arguments specific to those graphs. We propose a general approach which, when successful, establishes the required lower bound. This approach can be applied to the Cartesian product of any graph with arbitrarily large paths, and in each case involves solving a modified version of the crossing number problem on a finite number (typically only two or three) of small graphs. We demonstrate the potency of this approach by applying it to Cartesian products involving all 133 graphs $G$ of orders five or six, and show that it is successful in 128 cases. This includes 60 cases which a recent survey listed as either undetermined, or determined only in journals without adequate peer review.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
On the Uncrossed Number of Graphs
Authors:
Martin Balko,
Petr Hliněný,
Tomáš Masařík,
Joachim Orthaber,
Birgit Vogtenhuber,
Mirko H. Wagner
Abstract:
Visualizing a graph $G$ in the plane nicely, for example, without crossings, is unfortunately not always possible. To address this problem, Masařík and Hliněný [GD 2023] recently asked for each edge of $G$ to be drawn without crossings while allowing multiple different drawings of $G$. More formally, a collection $\mathcal{D}$ of drawings of $G$ is uncrossed if, for each edge $e$ of $G$, there is…
▽ More
Visualizing a graph $G$ in the plane nicely, for example, without crossings, is unfortunately not always possible. To address this problem, Masařík and Hliněný [GD 2023] recently asked for each edge of $G$ to be drawn without crossings while allowing multiple different drawings of $G$. More formally, a collection $\mathcal{D}$ of drawings of $G$ is uncrossed if, for each edge $e$ of $G$, there is a drawing in $\mathcal{D}$ such that $e$ is uncrossed. The uncrossed number $\mathrm{unc}(G)$ of $G$ is then the minimum number of drawings in some uncrossed collection of $G$.
No exact values of the uncrossed numbers have been determined yet, not even for simple graph classes. In this paper, we provide the exact values for uncrossed numbers of complete and complete bipartite graphs, partly confirming and partly refuting a conjecture posed by Hliněný and Masařík. We also present a strong general lower bound on $\mathrm{unc}(G)$ in terms of the number of vertices and edges of $G$. Moreover, we prove NP-hardness of the related problem of determining the edge crossing number of a graph $G$, which is the smallest number of edges of $G$ taken over all drawings of $G$ that participate in a crossing. This problem was posed as open by Schaefer in his book [Crossing Numbers of Graphs 2018].
△ Less
Submitted 17 June, 2025; v1 submitted 30 July, 2024;
originally announced July 2024.
-
Crossing Numbers of Beyond Planar Graphs Re-revisited: A Framework Approach
Authors:
Markus Chimani,
Torben Donzelmann,
Nick Kloster,
Melissa Koch,
Jan-Jakob Völlering,
Mirko H. Wagner
Abstract:
Beyond planarity concepts (prominent examples include k-planarity or fan-planarity) apply certain restrictions on the allowed patterns of crossings in drawings. It is natural to ask, how much the number of crossings may increase over the traditional (unrestricted) crossing number. Previous approaches to bound such ratios, e.g. [arXiv:1908.03153, arXiv:2105.12452], require very specialized construc…
▽ More
Beyond planarity concepts (prominent examples include k-planarity or fan-planarity) apply certain restrictions on the allowed patterns of crossings in drawings. It is natural to ask, how much the number of crossings may increase over the traditional (unrestricted) crossing number. Previous approaches to bound such ratios, e.g. [arXiv:1908.03153, arXiv:2105.12452], require very specialized constructions and arguments for each considered beyond planarity concept, and mostly only yield asymptotically non-tight bounds. We propose a very general proof framework that allows us to obtain asymptotically tight bounds, and where the concept-specific parts of the proof typically boil down to a couple of lines. We show the strength of our approach by giving improved or first bounds for several beyond planarity concepts.
△ Less
Submitted 4 September, 2024; v1 submitted 6 July, 2024;
originally announced July 2024.
-
Exact Minimum Weight Spanners via Column Generation
Authors:
Fritz Bökler,
Markus Chimani,
Henning Jasper,
Mirko H. Wagner
Abstract:
Given a weighted graph $G$, a minimum weight $α$-spanner is a least-weight subgraph $H\subseteq G$ that preserves minimum distances between all node pairs up to a factor of $α$. There are many results on heuristics and approximation algorithms, including a recent investigation of their practical performance [20]. Exact approaches, in contrast, have long been denounced as impractical: The first exa…
▽ More
Given a weighted graph $G$, a minimum weight $α$-spanner is a least-weight subgraph $H\subseteq G$ that preserves minimum distances between all node pairs up to a factor of $α$. There are many results on heuristics and approximation algorithms, including a recent investigation of their practical performance [20]. Exact approaches, in contrast, have long been denounced as impractical: The first exact ILP (integer linear program) method [48] from 2004 is based on a model with exponentially many path variables, solved via column generation. A second approach [2], modeling via arc-based multicommodity flow, was presented in 2019. In both cases, only graphs with 40-100 nodes were reported to be solvable.
In this paper, we briefly report on a theoretical comparison between these two models from a polyhedral point of view, and then concentrate on improvements and engineering aspects. We evaluate their performance in a large-scale empirical study. We report that our tuned column generation approach, based on multicriteria shortest path computations, is able to solve instances with over 16000 nodes within 13 minutes. Furthermore, now knowing optimal solutions for larger graphs, we are able to investigate the quality of the strongest known heuristic on reasonably sized instances for the first time.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Computing Representatives of Persistent Homology Generators with a Double Twist
Authors:
Tuyen Pham,
Hubert Wagner
Abstract:
With the growing availability of efficient tools, persistent homology is becoming a useful methodology in a variety of applications. Significant work has been devoted to implementing tools for persistent homology diagrams; however, computing representative cycles corresponding to each point in the diagram can still be inefficient. To circumvent this problem, we extend the twist algorithm of Chen a…
▽ More
With the growing availability of efficient tools, persistent homology is becoming a useful methodology in a variety of applications. Significant work has been devoted to implementing tools for persistent homology diagrams; however, computing representative cycles corresponding to each point in the diagram can still be inefficient. To circumvent this problem, we extend the twist algorithm of Chen and Kerber. Our extension is based on a new technique we call saving, which supplements their existing killing technique. The resulting two-pass strategy can be realized using an existing matrix reduction implementation as a black-box and improves the efficiency of computing representatives of persistent homology generators. We prove the correctness of the new approach and experimentally show its performance.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Mixup Barcodes: Quantifying Geometric-Topological Interactions between Point Clouds
Authors:
Hubert Wagner,
Nickolas Arustamyan,
Matthew Wheeler,
Peter Bubenik
Abstract:
We combine standard persistent homology with image persistent homology to define a novel way of characterizing shapes and interactions between them. In particular, we introduce: (1) a mixup barcode, which captures geometric-topological interactions (mixup) between two point sets in arbitrary dimension; (2) simple summary statistics, total mixup and total percentage mixup, which quantify the comple…
▽ More
We combine standard persistent homology with image persistent homology to define a novel way of characterizing shapes and interactions between them. In particular, we introduce: (1) a mixup barcode, which captures geometric-topological interactions (mixup) between two point sets in arbitrary dimension; (2) simple summary statistics, total mixup and total percentage mixup, which quantify the complexity of the interactions as a single number; (3) a software tool for playing with the above.
As a proof of concept, we apply this tool to a problem arising from machine learning. In particular, we study the disentanglement in embeddings of different classes. The results suggest that topological mixup is a useful method for characterizing interactions for low and high-dimensional data. Compared to the typical usage of persistent homology, the new tool is sensitive to the geometric locations of the topological features, which is often desirable.
△ Less
Submitted 5 December, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Amazon's 2023 Drought: Sentinel-1 Reveals Extreme Rio Negro River Contraction
Authors:
Fabien H Wagner,
Samuel Favrichon,
Ricardo Dalagnol,
Mayumi CM Hirye,
Adugna Mullissa,
Sassan Saatchi
Abstract:
The Amazon, the world's largest rainforest, faces a severe historic drought. The Rio Negro River, one of the major Amazon River tributaries, reaches its lowest level in a century in October 2023. Here, we used a U-net deep learning model to map water surfaces in the Rio Negro River basin every 12 days in 2022 and 2023 using 10 m spatial resolution Sentinel-1 satellite radar images. The accuracy of…
▽ More
The Amazon, the world's largest rainforest, faces a severe historic drought. The Rio Negro River, one of the major Amazon River tributaries, reaches its lowest level in a century in October 2023. Here, we used a U-net deep learning model to map water surfaces in the Rio Negro River basin every 12 days in 2022 and 2023 using 10 m spatial resolution Sentinel-1 satellite radar images. The accuracy of the water surface model was high with an F1-score of 0.93. The 12 days mosaic time series of water surface was generated from the Sentinel-1 prediction. The water surface mask demonstrated relatively consistent agreement with the Global Surface Water (GSW) product from Joint Research Centre (F1-score: 0.708) and with the Brazilian Mapbiomas Water initiative (F1-score: 0.686). The main errors of the map were omission errors in flooded woodland, in flooded shrub and because of clouds. Rio Negro water surfaces reached their lowest level around the 25th of November 2023 and were reduced to 68.1\% (9,559.9 km$^2$) of the maximum water surfaces observed in the period 2022-2023 (14,036.3 km$^2$). Synthetic Aperture Radar (SAR) data, in conjunction with deep learning techniques, can significantly improve near real-time mapping of water surface in tropical regions.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Sub-Meter Tree Height Mapping of California using Aerial Images and LiDAR-Informed U-Net Model
Authors:
Fabien H Wagner,
Sophia Roberts,
Alison L Ritz,
Griffin Carter,
Ricardo Dalagnol,
Samuel Favrichon,
Mayumi CM Hirye,
Martin Brandt,
Philipe Ciais,
Sassan Saatchi
Abstract:
Tree canopy height is one of the most important indicators of forest biomass, productivity, and species diversity, but it is challenging to measure accurately from the ground and from space. Here, we used a U-Net model adapted for regression to map the canopy height of all trees in the state of California with very high-resolution aerial imagery (60 cm) from the USDA-NAIP program. The U-Net model…
▽ More
Tree canopy height is one of the most important indicators of forest biomass, productivity, and species diversity, but it is challenging to measure accurately from the ground and from space. Here, we used a U-Net model adapted for regression to map the canopy height of all trees in the state of California with very high-resolution aerial imagery (60 cm) from the USDA-NAIP program. The U-Net model was trained using canopy height models computed from aerial LiDAR data as a reference, along with corresponding RGB-NIR NAIP images collected in 2020. We evaluated the performance of the deep-learning model using 42 independent 1 km$^2$ sites across various forest types and landscape variations in California. Our predictions of tree heights exhibited a mean error of 2.9 m and showed relatively low systematic bias across the entire range of tree heights present in California. In 2020, trees taller than 5 m covered ~ 19.3% of California. Our model successfully estimated canopy heights up to 50 m without saturation, outperforming existing canopy height products from global models. The approach we used allowed for the reconstruction of the three-dimensional structure of individual trees as observed from nadir-looking optical airborne imagery, suggesting a relatively robust estimation and mapping capability, even in the presence of image distortion. These findings demonstrate the potential of large-scale mapping and monitoring of tree height, as well as potential biomass estimation, using NAIP imagery.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
Mapping Tropical Forest Cover and Deforestation with Planet NICFI Satellite Images and Deep Learning in Mato Grosso State (Brazil) from 2015 to 2021
Authors:
Fabien H Wagner,
Ricardo Dalagnol,
Celso HL Silva-Junior,
Griffin Carter,
Alison L Ritz,
Mayumi CM Hirye,
Jean PHB Ometto,
Sassan Saatchi
Abstract:
Monitoring changes in tree cover for rapid assessment of deforestation is considered the critical component of any climate mitigation policy for reducing carbon. Here, we map tropical tree cover and deforestation between 2015 and 2022 using 5 m spatial resolution Planet NICFI satellite images over the state of Mato Grosso (MT) in Brazil and a U-net deep learning model. The tree cover for the state…
▽ More
Monitoring changes in tree cover for rapid assessment of deforestation is considered the critical component of any climate mitigation policy for reducing carbon. Here, we map tropical tree cover and deforestation between 2015 and 2022 using 5 m spatial resolution Planet NICFI satellite images over the state of Mato Grosso (MT) in Brazil and a U-net deep learning model. The tree cover for the state was 556510.8 km$^2$ in 2015 (58.1 % of the MT State) and was reduced to 141598.5 km$^2$ (14.8 % of total area) at the end of 2021. After reaching a minimum deforested area in December 2016 with 6632.05 km$^2$, the bi-annual deforestation area only showed a slight increase between December 2016 and December 2019. A year after, the areas of deforestation almost doubled from 9944.5 km$^2$ in December 2019 to 19817.8 km$^2$ in December 2021. The high-resolution data product showed relatively consistent agreement with the official deforestation map from Brazil (67.2%) but deviated significantly from year of forest cover loss estimates from the Global Forest change (GFC) product, mainly due to large area of fire degradation observed in the GFC data. High-resolution imagery from Planet NICFI associated with deep learning technics can significantly improve mapping deforestation extent in tropics.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
PaMILO: A Solver for Multi-Objective Mixed Integer Linear Optimization and Beyond
Authors:
Fritz Bökler,
Levin Nemesch,
Mirko H. Wagner
Abstract:
In multi-objective optimization, several potentially conflicting objective functions need to be optimized. Instead of one optimal solution, we look for the set of so called non-dominated solutions.
An important subset is the set of non-dominated extreme points. Finding it is a computationally hard problem in general. While solvers for similar problems exist, there are none known for multi-object…
▽ More
In multi-objective optimization, several potentially conflicting objective functions need to be optimized. Instead of one optimal solution, we look for the set of so called non-dominated solutions.
An important subset is the set of non-dominated extreme points. Finding it is a computationally hard problem in general. While solvers for similar problems exist, there are none known for multi-objective mixed integer linear programs (MOMILPs) or multi-objective mixed integer quadratically constrained quadratic programs (MOMIQCQPs). We present PaMILO, the first solver for finding non-dominated extreme points of MOMILPs and MOMIQCQPs. It can be found on github under github.com/FritzBo/PaMILO. PaMILO provides an easy-to-use interface and is implemented in C++17. It solves occurring subproblems employing either CPLEX or Gurobi.
PaMILO adapts the Dual-Benson algorithm for multi-objective linear programming (MOLP). As it was previously only defined for MOLPs, we describe how it can be adapted for MOMILPs, MOMIQCQPs and even more problem classes in the future.
△ Less
Submitted 21 April, 2023; v1 submitted 19 July, 2022;
originally announced July 2022.
-
K-textures, a self-supervised hard clustering deep learning algorithm for satellite image segmentation
Authors:
Fabien H. Wagner,
Ricardo Dalagnol,
Alber H. Sánchez,
Mayumi C. M. Hirye,
Samuel Favrichon,
Jake H. Lee,
Steffen Mauceri,
Yan Yang,
Sassan Saatchi
Abstract:
Deep learning self-supervised algorithms that can segment an image in a fixed number of hard labels such as the k-means algorithm and relying only on deep learning techniques are still lacking. Here, we introduce the k-textures algorithm which provides self-supervised segmentation of a 4-band image (RGB-NIR) for a $k$ number of classes. An example of its application on high resolution Planet satel…
▽ More
Deep learning self-supervised algorithms that can segment an image in a fixed number of hard labels such as the k-means algorithm and relying only on deep learning techniques are still lacking. Here, we introduce the k-textures algorithm which provides self-supervised segmentation of a 4-band image (RGB-NIR) for a $k$ number of classes. An example of its application on high resolution Planet satellite imagery is given. Our algorithm shows that discrete search is feasible using convolutional neural networks (CNN) and gradient descent. The model detects $k$ hard clustering classes represented in the model as $k$ discrete binary masks and their associated $k$ independently generated textures, that combined are a simulation of the original image. The similarity loss is the mean squared error between the features of the original and the simulated image, both extracted from the penultimate convolutional block of Keras 'imagenet' pretrained VGG-16 model and a custom feature extractor made with Planet data. The main advances of the k-textures model are: first, the $k$ discrete binary masks are obtained inside the model using gradient descent. The model allows for the generation of discrete binary masks using a novel method using a hard sigmoid activation function. Second, it provides hard clustering classes -- each pixels has only one class. Finally, in comparison to k-means, where each pixel is considered independently, here, contextual information is also considered and each class is not associated only to similar values in the color channels but also to a texture. Our approach is designed to ease the production of training samples for satellite image segmentation and the k-textures architecture could be adapted to support different number of bands and for more complex tasks, such as object self-segmentation. The model codes and weights are available at https://doi.org/10.5281/zenodo.6359859
△ Less
Submitted 27 May, 2022; v1 submitted 17 May, 2022;
originally announced May 2022.
-
GPU Computation of the Euler Characteristic Curve for Imaging Data
Authors:
Fan Wang,
Hubert Wagner,
Chao Chen
Abstract:
Persistent homology is perhaps the most popular and useful tool offered by topological data analysis, with point-cloud data being the most common setup. Its older cousin, the Euler characteristic curve (ECC) is less expressive, but far easier to compute. It is particularly suitable for analyzing imaging data, and is commonly used in fields ranging from astrophysics to biomedical image analysis. Th…
▽ More
Persistent homology is perhaps the most popular and useful tool offered by topological data analysis, with point-cloud data being the most common setup. Its older cousin, the Euler characteristic curve (ECC) is less expressive, but far easier to compute. It is particularly suitable for analyzing imaging data, and is commonly used in fields ranging from astrophysics to biomedical image analysis. These fields are embracing GPU computations to handle increasingly large datasets. We therefore propose an optimized GPU implementation of ECC computation for 2D and 3D grayscale images. The goal of this paper is twofold. First, we offer a practical tool, illustrating its performance with thorough experimentation, but also explain its inherent shortcomings. Second, this simple algorithm serves as a perfect backdrop for highlighting basic GPU programming techniques that make our implementation so efficient, and some common pitfalls we avoided. This is intended as a step towards a wider usage of GPU programming in computational geometry and topology software. We find this is particularly important as geometric and topological tools are used in conjunction with modern, GPU-accelerated machine learning frameworks.
△ Less
Submitted 3 March, 2023; v1 submitted 17 March, 2022;
originally announced March 2022.
-
A Simple Standard for Sharing Ontological Mappings (SSSOM)
Authors:
Nicolas Matentzoglu,
James P. Balhoff,
Susan M. Bello,
Chris Bizon,
Matthew Brush,
Tiffany J. Callahan,
Christopher G Chute,
William D. Duncan,
Chris T. Evelo,
Davera Gabriel,
John Graybeal,
Alasdair Gray,
Benjamin M. Gyori,
Melissa Haendel,
Henriette Harmse,
Nomi L. Harris,
Ian Harrow,
Harshad Hegde,
Amelia L. Hoyt,
Charles T. Hoyt,
Dazhi Jiao,
Ernesto Jiménez-Ruiz,
Simon Jupp,
Hyeongsik Kim,
Sebastian Koehler
, et al. (19 additional authors not shown)
Abstract:
Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, ar…
▽ More
Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Are they associated in some other way? Such relationships between the mapped terms are often not documented, leading to incorrect assumptions and making them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Also, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones.
The Simple Standard for Sharing Ontological Mappings (SSSOM) addresses these problems by: 1. Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. 2. Defining an easy to use table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data standards. 3. Implementing open and community-driven collaborative workflows designed to evolve the standard continuously to address changing requirements and mapping practices. 4. Providing reference tools and software libraries for working with the standard.
In this paper, we present the SSSOM standard, describe several use cases, and survey some existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable, and Reusable (FAIR). The SSSOM specification is at http://w3id.org/sssom/spec.
△ Less
Submitted 13 December, 2021;
originally announced December 2021.
-
Properties of Large 2-Crossing-Critical Graphs
Authors:
Drago Bokal,
Markus Chimani,
Alexander Nover,
Jöran Schierbaum,
Tobias Stolzmann,
Mirko H. Wagner,
Tilo Wiedera
Abstract:
A $c$-crossing-critical graph is one that has crossing number at least $c$ but each of its proper subgraphs has crossing number less than $c$. Recently, a set of explicit construction rules was identified by Bokal, Oporowski, Richter, and Salazar to generate all large $2$-crossing-critical graphs (i.e., all apart from a finite set of small sporadic graphs). They share the property of containing a…
▽ More
A $c$-crossing-critical graph is one that has crossing number at least $c$ but each of its proper subgraphs has crossing number less than $c$. Recently, a set of explicit construction rules was identified by Bokal, Oporowski, Richter, and Salazar to generate all large $2$-crossing-critical graphs (i.e., all apart from a finite set of small sporadic graphs). They share the property of containing a generalized Wagner graph $V_{10}$ as a subdivision.
In this paper, we study these graphs and establish their order, simple crossing number, edge cover number, clique number, maximum degree, chromatic number, chromatic index, and treewidth. We also show that the graphs are linear-time recognizable and that all our proofs lead to efficient algorithms for the above measures.
△ Less
Submitted 9 December, 2021;
originally announced December 2021.
-
Topological Detection of Trojaned Neural Networks
Authors:
Songzhu Zheng,
Yikai Zhang,
Hubert Wagner,
Mayank Goswami,
Chao Chen
Abstract:
Deep neural networks are known to have security issues. One particular threat is the Trojan attack. It occurs when the attackers stealthily manipulate the model's behavior through Trojaned training samples, which can later be exploited.
Guided by basic neuroscientific principles we discover subtle -- yet critical -- structural deviation characterizing Trojaned models. In our analysis we use topo…
▽ More
Deep neural networks are known to have security issues. One particular threat is the Trojan attack. It occurs when the attackers stealthily manipulate the model's behavior through Trojaned training samples, which can later be exploited.
Guided by basic neuroscientific principles we discover subtle -- yet critical -- structural deviation characterizing Trojaned models. In our analysis we use topological tools. They allow us to model high-order dependencies in the networks, robustly compare different networks, and localize structural abnormalities. One interesting observation is that Trojaned models develop short-cuts from input to output layers.
Inspired by these observations, we devise a strategy for robust detection of Trojaned models. Compared to standard baselines it displays better performance on multiple benchmarks.
△ Less
Submitted 11 June, 2021;
originally announced June 2021.
-
An Experimental Study of ILP Formulations for the Longest Induced Path Problem
Authors:
Fritz Bökler,
Markus Chimani,
Mirko H. Wagner,
Tilo Wiedera
Abstract:
Given a graph $G=(V,E)$, the longest induced path problem asks for a maximum cardinality node subset $W\subseteq V$ such that the graph induced by $W$ is a path. It is a long established problem with applications, e.g., in network analysis. We propose novel integer linear programming (ILP) formulations for the problem and discuss efficient implementations thereof. Comparing them with known formula…
▽ More
Given a graph $G=(V,E)$, the longest induced path problem asks for a maximum cardinality node subset $W\subseteq V$ such that the graph induced by $W$ is a path. It is a long established problem with applications, e.g., in network analysis. We propose novel integer linear programming (ILP) formulations for the problem and discuss efficient implementations thereof. Comparing them with known formulations from literature, we prove that they are beneficial in theory, yielding stronger relaxations. Moreover, our experiments show their practical superiority.
△ Less
Submitted 17 October, 2020; v1 submitted 17 February, 2020;
originally announced February 2020.
-
Topological Data Analysis in Information Space
Authors:
Herbert Edelsbrunner,
Ziga Virk,
Hubert Wagner
Abstract:
Various kinds of data are routinely represented as discrete probability distributions. Examples include text documents summarized by histograms of word occurrences and images represented as histograms of oriented gradients. Viewing a discrete probability distribution as a point in the standard simplex of the appropriate dimension, we can understand collections of such objects in geometric and topo…
▽ More
Various kinds of data are routinely represented as discrete probability distributions. Examples include text documents summarized by histograms of word occurrences and images represented as histograms of oriented gradients. Viewing a discrete probability distribution as a point in the standard simplex of the appropriate dimension, we can understand collections of such objects in geometric and topological terms. Importantly, instead of using the standard Euclidean distance, we look into dissimilarity measures with information-theoretic justification, and we develop the theory needed for applying topological data analysis in this setting. In doing so, we emphasize constructions that enable usage of existing computational topology software in this context.
△ Less
Submitted 28 March, 2019; v1 submitted 20 March, 2019;
originally announced March 2019.
-
Streaming Algorithm for Euler Characteristic Curves of Multidimensional Images
Authors:
Teresa Heiss,
Hubert Wagner
Abstract:
We present an efficient algorithm to compute Euler characteristic curves of gray scale images of arbitrary dimension. In various applications the Euler characteristic curve is used as a descriptor of an image.
Our algorithm is the first streaming algorithm for Euler characteristic curves. The usage of streaming removes the necessity to store the entire image in RAM. Experiments show that our imp…
▽ More
We present an efficient algorithm to compute Euler characteristic curves of gray scale images of arbitrary dimension. In various applications the Euler characteristic curve is used as a descriptor of an image.
Our algorithm is the first streaming algorithm for Euler characteristic curves. The usage of streaming removes the necessity to store the entire image in RAM. Experiments show that our implementation handles terabyte scale images on commodity hardware. Due to lock-free parallelism, it scales well with the number of processor cores. Our software---CHUNKYEuler---is available as open source on Bitbucket.
Additionally, we put the concept of the Euler characteristic curve in the wider context of computational topology. In particular, we explain the connection with persistence diagrams.
△ Less
Submitted 17 October, 2018; v1 submitted 4 May, 2017;
originally announced May 2017.
-
Solving equations and optimization problems with uncertainty
Authors:
Peter Franek,
Marek Krčál,
Hubert Wagner
Abstract:
We study the problem of detecting zeros of continuous functions that are known only up to an error bound, extending the earlier theoretical work with explicit algorithms and experiments with an implementation. More formally, the robustness of zero of a continuous map $f: X\to \mathbb{R}^n$ is the maximal $r>0$ such that each $g:X\to\mathbb{R}^n$ with $\|f-g\|_\infty\le r$ has a zero. We develop an…
▽ More
We study the problem of detecting zeros of continuous functions that are known only up to an error bound, extending the earlier theoretical work with explicit algorithms and experiments with an implementation. More formally, the robustness of zero of a continuous map $f: X\to \mathbb{R}^n$ is the maximal $r>0$ such that each $g:X\to\mathbb{R}^n$ with $\|f-g\|_\infty\le r$ has a zero. We develop and implement an efficient algorithm approximating the robustness of zero. Further, we show how to use the algorithm for approximating worst-case optima in optimization problems in which the feasible domain is defined by equations that are only known approximately.
An important ingredient is an algorithm for deciding the topological extension problem based on computing cohomological obstructions to extendability and their persistence. We describe an explicit algorithm for the primary and secondary obstruction, two stages of a sequence of algorithms with increasing complexity. We provide experimental evidence that for random Gaussian fields, the primary obstruction---a much less computationally demanding test than the secondary obstruction---is typically sufficient for approximating robustness of zero.
△ Less
Submitted 27 September, 2017; v1 submitted 21 July, 2016;
originally announced July 2016.
-
Topological Data Analysis with Bregman Divergences
Authors:
Herbert Edelsbrunner,
Hubert Wagner
Abstract:
Given a finite set in a metric space, the topological analysis generalizes hierarchical clustering using a 1-parameter family of homology groups to quantify connectivity in all dimensions. The connectivity is compactly described by the persistence diagram. One limitation of the current framework is the reliance on metric distances, whereas in many practical applications objects are compared by non…
▽ More
Given a finite set in a metric space, the topological analysis generalizes hierarchical clustering using a 1-parameter family of homology groups to quantify connectivity in all dimensions. The connectivity is compactly described by the persistence diagram. One limitation of the current framework is the reliance on metric distances, whereas in many practical applications objects are compared by non-metric dissimilarity measures. Examples are the Kullback-Leibler divergence, which is commonly used for comparing text and images, and the Itakura-Saito divergence, popular for speech and sound. These are two members of the broad family of dissimilarities called Bregman divergences.
We show that the framework of topological data analysis can be extended to general Bregman divergences, widening the scope of possible applications. In particular, we prove that appropriately generalized Cech and Delaunay (alpha) complexes capture the correct homotopy type, namely that of the corresponding union of Bregman balls. Consequently, their filtrations give the correct persistence diagram, namely the one generated by the uniformly growing Bregman balls. Moreover, we show that unlike the metric setting, the filtration of Vietoris-Rips complexes may fail to approximate the persistence diagram. We propose algorithms to compute the thus generalized Cech, Vietoris-Rips and Delaunay complexes and experimentally test their efficiency. Lastly, we explain their surprisingly good performance by making a connection with discrete Morse theory.
△ Less
Submitted 21 July, 2016;
originally announced July 2016.
-
Computing homology and persistent homology using iterated Morse decomposition
Authors:
Paweł Dłotko,
Hubert Wagner
Abstract:
In this paper we present a new approach to computing homology (with field coefficients) and persistent homology. We use concepts from discrete Morse theory, to provide an algorithm which can be expressed solely in terms of simple graph theoretical operations. We use iterated Morse decomposition, which allows us to sidetrack many problems related to the standard discrete Morse theory. In particular…
▽ More
In this paper we present a new approach to computing homology (with field coefficients) and persistent homology. We use concepts from discrete Morse theory, to provide an algorithm which can be expressed solely in terms of simple graph theoretical operations. We use iterated Morse decomposition, which allows us to sidetrack many problems related to the standard discrete Morse theory. In particular, this approach is provably correct in any dimension.
△ Less
Submitted 25 October, 2012; v1 submitted 4 October, 2012;
originally announced October 2012.