Search | arXiv e-print repository

Bregman-Hausdorff divergence: strengthening the connections between computational geometry and machine learning

Authors: Tuyen Pham, Hana Dal Poz Kouřimská, Hubert Wagner

Abstract: The purpose of this paper is twofold. On a technical side, we propose an extension of the Hausdorff distance from metric spaces to spaces equipped with asymmetric distance measures. Specifically, we focus on the family of Bregman divergences, which includes the popular Kullback--Leibler divergence (also known as relative entropy). As a proof of concept, we use the resulting Bregman--Hausdorff di… ▽ More The purpose of this paper is twofold. On a technical side, we propose an extension of the Hausdorff distance from metric spaces to spaces equipped with asymmetric distance measures. Specifically, we focus on the family of Bregman divergences, which includes the popular Kullback--Leibler divergence (also known as relative entropy). As a proof of concept, we use the resulting Bregman--Hausdorff divergence to compare two collections of probabilistic predictions produced by different machine learning models trained using the relative entropy loss. The algorithms we propose are surprisingly efficient even for large inputs with hundreds of dimensions. In addition to the introduction of this technical concept, we provide a survey. It outlines the basics of Bregman geometry, as well as computational geometry algorithms. We focus on algorithms that are compatible with this geometry and are relevant for machine learning. △ Less

Submitted 9 April, 2025; originally announced April 2025.

Comments: 23 pages, 11 figures, 3 tables, 3 algorithms, submitted to Machine Learning and Knowledge Extraction

arXiv:2502.13425 [pdf, other]

Fast Kd-trees for the Kullback--Leibler Divergence and other Decomposable Bregman Divergences

Authors: Tuyen Pham, Hubert Wagner

Abstract: The contributions of the paper span theoretical and implementational results. First, we prove that Kd-trees can be extended to spaces in which the distance is measured with an arbitrary Bregman divergence. Perhaps surprisingly, this shows that the triangle inequality is not necessary for correct pruning in Kd-trees. Second, we offer an efficient algorithm and C++ implementation for nearest neighbo… ▽ More The contributions of the paper span theoretical and implementational results. First, we prove that Kd-trees can be extended to spaces in which the distance is measured with an arbitrary Bregman divergence. Perhaps surprisingly, this shows that the triangle inequality is not necessary for correct pruning in Kd-trees. Second, we offer an efficient algorithm and C++ implementation for nearest neighbour search for decomposable Bregman divergences. The implementation supports the Kullback--Leibler divergence (relative entropy) which is a popular distance between probability vectors and is commonly used in statistics and machine learning. This is a step toward broadening the usage of computational geometry algorithms. Our benchmarks show that our implementation efficiently handles both exact and approximate nearest neighbour queries. Compared to a naive approach, we achieve two orders of magnitude speedup for practical scenarios in dimension up to 100. Our solution is simpler and more efficient than competing methods. △ Less

Submitted 18 February, 2025; originally announced February 2025.

Comments: 18 pages, 4 tables, 5 figures in main paper, 1 table in appendix. Submitted to WADS 2025

arXiv:2501.10600 [pdf, other]

High Resolution Tree Height Mapping of the Amazon Forest using Planet NICFI Images and LiDAR-Informed U-Net Model

Authors: Fabien H Wagner, Ricardo Dalagnol, Griffin Carter, Mayumi CM Hirye, Shivraj Gill, Le Bienfaiteur Sagang Takougoum, Samuel Favrichon, Michael Keller, Jean PHB Ometto, Lorena Alves, Cynthia Creze, Stephanie P George-Chacon, Shuang Li, Zhihua Liu, Adugna Mullissa, Yan Yang, Erone G Santos, Sarah R Worden, Martin Brandt, Philippe Ciais, Stephen C Hagen, Sassan Saatchi

Abstract: Tree canopy height is one of the most important indicators of forest biomass, productivity, and ecosystem structure, but it is challenging to measure accurately from the ground and from space. Here, we used a U-Net model adapted for regression to map the mean tree canopy height in the Amazon forest from Planet NICFI images at ~4.78 m spatial resolution for the period 2020-2024. The U-Net model was… ▽ More Tree canopy height is one of the most important indicators of forest biomass, productivity, and ecosystem structure, but it is challenging to measure accurately from the ground and from space. Here, we used a U-Net model adapted for regression to map the mean tree canopy height in the Amazon forest from Planet NICFI images at ~4.78 m spatial resolution for the period 2020-2024. The U-Net model was trained using canopy height models computed from aerial LiDAR data as a reference, along with their corresponding Planet NICFI images. Predictions of tree heights on the validation sample exhibited a mean error of 3.68 m and showed relatively low systematic bias across the entire range of tree heights present in the Amazon forest. Our model successfully estimated canopy heights up to 40-50 m without much saturation, outperforming existing canopy height products from global models in this region. We determined that the Amazon forest has an average canopy height of ~22 m. Events such as logging or deforestation could be detected from changes in tree height, and encouraging results were obtained to monitor the height of regenerating forests. These findings demonstrate the potential for large-scale mapping and monitoring of tree height for old and regenerating Amazon forests using Planet NICFI imagery. △ Less

Submitted 17 January, 2025; originally announced January 2025.

Comments: will be submitted to the journal Remote Sensing of Environment in February 2025

MSC Class: 92-08 ACM Class: I.4.8

arXiv:2409.06755 [pdf, other]

A Systematic Approach to Crossing Numbers of Cartesian Products with Paths

Authors: Zayed Asiri, Ryan Burdett, Markus Chimani, Michael Haythorpe, Alex Newcombe, Mirko H. Wagner

Abstract: Determining the crossing numbers of Cartesian products of small graphs with arbitrarily large paths has been an ongoing topic of research since the 1970s. Doing so requires the establishment of coincident upper and lower bounds; the former is usually demonstrated by providing a suitable drawing procedure, while the latter often requires substantial theoretical arguments. Many such papers have been… ▽ More Determining the crossing numbers of Cartesian products of small graphs with arbitrarily large paths has been an ongoing topic of research since the 1970s. Doing so requires the establishment of coincident upper and lower bounds; the former is usually demonstrated by providing a suitable drawing procedure, while the latter often requires substantial theoretical arguments. Many such papers have been published, which typically focus on just one or two small graphs at a time, and use ad hoc arguments specific to those graphs. We propose a general approach which, when successful, establishes the required lower bound. This approach can be applied to the Cartesian product of any graph with arbitrarily large paths, and in each case involves solving a modified version of the crossing number problem on a finite number (typically only two or three) of small graphs. We demonstrate the potency of this approach by applying it to Cartesian products involving all 133 graphs $G$ of orders five or six, and show that it is successful in 128 cases. This includes 60 cases which a recent survey listed as either undetermined, or determined only in journals without adequate peer review. △ Less

Submitted 10 September, 2024; originally announced September 2024.

MSC Class: 05C62; 68R10 ACM Class: G.2.2

arXiv:2407.21206 [pdf, ps, other]

doi 10.4230/LIPIcs.GD.2024.18

On the Uncrossed Number of Graphs

Authors: Martin Balko, Petr Hliněný, Tomáš Masařík, Joachim Orthaber, Birgit Vogtenhuber, Mirko H. Wagner

Abstract: Visualizing a graph $G$ in the plane nicely, for example, without crossings, is unfortunately not always possible. To address this problem, Masařík and Hliněný [GD 2023] recently asked for each edge of $G$ to be drawn without crossings while allowing multiple different drawings of $G$. More formally, a collection $\mathcal{D}$ of drawings of $G$ is uncrossed if, for each edge $e$ of $G$, there is… ▽ More Visualizing a graph $G$ in the plane nicely, for example, without crossings, is unfortunately not always possible. To address this problem, Masařík and Hliněný [GD 2023] recently asked for each edge of $G$ to be drawn without crossings while allowing multiple different drawings of $G$. More formally, a collection $\mathcal{D}$ of drawings of $G$ is uncrossed if, for each edge $e$ of $G$, there is a drawing in $\mathcal{D}$ such that $e$ is uncrossed. The uncrossed number $\mathrm{unc}(G)$ of $G$ is then the minimum number of drawings in some uncrossed collection of $G$. No exact values of the uncrossed numbers have been determined yet, not even for simple graph classes. In this paper, we provide the exact values for uncrossed numbers of complete and complete bipartite graphs, partly confirming and partly refuting a conjecture posed by Hliněný and Masařík. We also present a strong general lower bound on $\mathrm{unc}(G)$ in terms of the number of vertices and edges of $G$. Moreover, we prove NP-hardness of the related problem of determining the edge crossing number of a graph $G$, which is the smallest number of edges of $G$ taken over all drawings of $G$ that participate in a crossing. This problem was posed as open by Schaefer in his book [Crossing Numbers of Graphs 2018]. △ Less

Submitted 17 June, 2025; v1 submitted 30 July, 2024; originally announced July 2024.

Comments: Appears in the Proceedings of the 32nd International Symposium on Graph Drawing and Network Visualization (GD 2024). 20 pages, 7 figures

MSC Class: 68R10 ACM Class: G.2.2; F.2.2

arXiv:2407.05057 [pdf, other]

Crossing Numbers of Beyond Planar Graphs Re-revisited: A Framework Approach

Authors: Markus Chimani, Torben Donzelmann, Nick Kloster, Melissa Koch, Jan-Jakob Völlering, Mirko H. Wagner

Abstract: Beyond planarity concepts (prominent examples include k-planarity or fan-planarity) apply certain restrictions on the allowed patterns of crossings in drawings. It is natural to ask, how much the number of crossings may increase over the traditional (unrestricted) crossing number. Previous approaches to bound such ratios, e.g. [arXiv:1908.03153, arXiv:2105.12452], require very specialized construc… ▽ More Beyond planarity concepts (prominent examples include k-planarity or fan-planarity) apply certain restrictions on the allowed patterns of crossings in drawings. It is natural to ask, how much the number of crossings may increase over the traditional (unrestricted) crossing number. Previous approaches to bound such ratios, e.g. [arXiv:1908.03153, arXiv:2105.12452], require very specialized constructions and arguments for each considered beyond planarity concept, and mostly only yield asymptotically non-tight bounds. We propose a very general proof framework that allows us to obtain asymptotically tight bounds, and where the concept-specific parts of the proof typically boil down to a couple of lines. We show the strength of our approach by giving improved or first bounds for several beyond planarity concepts. △ Less

Submitted 4 September, 2024; v1 submitted 6 July, 2024; originally announced July 2024.

Comments: 19 pages, 5 figures

MSC Class: 68R10; 05C62 ACM Class: G.2.2

arXiv:2406.19164 [pdf, other]

Exact Minimum Weight Spanners via Column Generation

Authors: Fritz Bökler, Markus Chimani, Henning Jasper, Mirko H. Wagner

Abstract: Given a weighted graph $G$, a minimum weight $α$-spanner is a least-weight subgraph $H\subseteq G$ that preserves minimum distances between all node pairs up to a factor of $α$. There are many results on heuristics and approximation algorithms, including a recent investigation of their practical performance [20]. Exact approaches, in contrast, have long been denounced as impractical: The first exa… ▽ More Given a weighted graph $G$, a minimum weight $α$-spanner is a least-weight subgraph $H\subseteq G$ that preserves minimum distances between all node pairs up to a factor of $α$. There are many results on heuristics and approximation algorithms, including a recent investigation of their practical performance [20]. Exact approaches, in contrast, have long been denounced as impractical: The first exact ILP (integer linear program) method [48] from 2004 is based on a model with exponentially many path variables, solved via column generation. A second approach [2], modeling via arc-based multicommodity flow, was presented in 2019. In both cases, only graphs with 40-100 nodes were reported to be solvable. In this paper, we briefly report on a theoretical comparison between these two models from a polyhedral point of view, and then concentrate on improvements and engineering aspects. We evaluate their performance in a large-scale empirical study. We report that our tuned column generation approach, based on multicriteria shortest path computations, is able to solve instances with over 16000 nodes within 13 minutes. Furthermore, now knowing optimal solutions for larger graphs, we are able to investigate the quality of the strongest known heuristic on reasonably sized instances for the first time. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: Conference version to be published in ESA 2024

MSC Class: 68R10 (Primary) 05C85; 90C11 (Secondary) ACM Class: F.2.2; G.2.1; G.2.2

arXiv:2403.04100 [pdf, other]

Computing Representatives of Persistent Homology Generators with a Double Twist

Authors: Tuyen Pham, Hubert Wagner

Abstract: With the growing availability of efficient tools, persistent homology is becoming a useful methodology in a variety of applications. Significant work has been devoted to implementing tools for persistent homology diagrams; however, computing representative cycles corresponding to each point in the diagram can still be inefficient. To circumvent this problem, we extend the twist algorithm of Chen a… ▽ More With the growing availability of efficient tools, persistent homology is becoming a useful methodology in a variety of applications. Significant work has been devoted to implementing tools for persistent homology diagrams; however, computing representative cycles corresponding to each point in the diagram can still be inefficient. To circumvent this problem, we extend the twist algorithm of Chen and Kerber. Our extension is based on a new technique we call saving, which supplements their existing killing technique. The resulting two-pass strategy can be realized using an existing matrix reduction implementation as a black-box and improves the efficiency of computing representatives of persistent homology generators. We prove the correctness of the new approach and experimentally show its performance. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Journal ref: Canadian Conference on Computational Geometry, Volume 35, 283-290 (2023)

arXiv:2402.15058 [pdf, other]

Mixup Barcodes: Quantifying Geometric-Topological Interactions between Point Clouds

Authors: Hubert Wagner, Nickolas Arustamyan, Matthew Wheeler, Peter Bubenik

Abstract: We combine standard persistent homology with image persistent homology to define a novel way of characterizing shapes and interactions between them. In particular, we introduce: (1) a mixup barcode, which captures geometric-topological interactions (mixup) between two point sets in arbitrary dimension; (2) simple summary statistics, total mixup and total percentage mixup, which quantify the comple… ▽ More We combine standard persistent homology with image persistent homology to define a novel way of characterizing shapes and interactions between them. In particular, we introduce: (1) a mixup barcode, which captures geometric-topological interactions (mixup) between two point sets in arbitrary dimension; (2) simple summary statistics, total mixup and total percentage mixup, which quantify the complexity of the interactions as a single number; (3) a software tool for playing with the above. As a proof of concept, we apply this tool to a problem arising from machine learning. In particular, we study the disentanglement in embeddings of different classes. The results suggest that topological mixup is a useful method for characterizing interactions for low and high-dimensional data. Compared to the typical usage of persistent homology, the new tool is sensitive to the geometric locations of the topological features, which is often desirable. △ Less

Submitted 5 December, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

arXiv:2401.16393 [pdf, other]

Amazon's 2023 Drought: Sentinel-1 Reveals Extreme Rio Negro River Contraction

Authors: Fabien H Wagner, Samuel Favrichon, Ricardo Dalagnol, Mayumi CM Hirye, Adugna Mullissa, Sassan Saatchi

Abstract: The Amazon, the world's largest rainforest, faces a severe historic drought. The Rio Negro River, one of the major Amazon River tributaries, reaches its lowest level in a century in October 2023. Here, we used a U-net deep learning model to map water surfaces in the Rio Negro River basin every 12 days in 2022 and 2023 using 10 m spatial resolution Sentinel-1 satellite radar images. The accuracy of… ▽ More The Amazon, the world's largest rainforest, faces a severe historic drought. The Rio Negro River, one of the major Amazon River tributaries, reaches its lowest level in a century in October 2023. Here, we used a U-net deep learning model to map water surfaces in the Rio Negro River basin every 12 days in 2022 and 2023 using 10 m spatial resolution Sentinel-1 satellite radar images. The accuracy of the water surface model was high with an F1-score of 0.93. The 12 days mosaic time series of water surface was generated from the Sentinel-1 prediction. The water surface mask demonstrated relatively consistent agreement with the Global Surface Water (GSW) product from Joint Research Centre (F1-score: 0.708) and with the Brazilian Mapbiomas Water initiative (F1-score: 0.686). The main errors of the map were omission errors in flooded woodland, in flooded shrub and because of clouds. Rio Negro water surfaces reached their lowest level around the 25th of November 2023 and were reduced to 68.1\% (9,559.9 km$^2$) of the maximum water surfaces observed in the period 2022-2023 (14,036.3 km$^2$). Synthetic Aperture Radar (SAR) data, in conjunction with deep learning techniques, can significantly improve near real-time mapping of water surface in tropical regions. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: 17 pages, 6 figures, 1 table

MSC Class: 92F05 ACM Class: I.4.6

arXiv:2306.01936 [pdf, other]

Sub-Meter Tree Height Mapping of California using Aerial Images and LiDAR-Informed U-Net Model

Authors: Fabien H Wagner, Sophia Roberts, Alison L Ritz, Griffin Carter, Ricardo Dalagnol, Samuel Favrichon, Mayumi CM Hirye, Martin Brandt, Philipe Ciais, Sassan Saatchi

Abstract: Tree canopy height is one of the most important indicators of forest biomass, productivity, and species diversity, but it is challenging to measure accurately from the ground and from space. Here, we used a U-Net model adapted for regression to map the canopy height of all trees in the state of California with very high-resolution aerial imagery (60 cm) from the USDA-NAIP program. The U-Net model… ▽ More Tree canopy height is one of the most important indicators of forest biomass, productivity, and species diversity, but it is challenging to measure accurately from the ground and from space. Here, we used a U-Net model adapted for regression to map the canopy height of all trees in the state of California with very high-resolution aerial imagery (60 cm) from the USDA-NAIP program. The U-Net model was trained using canopy height models computed from aerial LiDAR data as a reference, along with corresponding RGB-NIR NAIP images collected in 2020. We evaluated the performance of the deep-learning model using 42 independent 1 km$^2$ sites across various forest types and landscape variations in California. Our predictions of tree heights exhibited a mean error of 2.9 m and showed relatively low systematic bias across the entire range of tree heights present in California. In 2020, trees taller than 5 m covered ~ 19.3% of California. Our model successfully estimated canopy heights up to 50 m without saturation, outperforming existing canopy height products from global models. The approach we used allowed for the reconstruction of the three-dimensional structure of individual trees as observed from nadir-looking optical airborne imagery, suggesting a relatively robust estimation and mapping capability, even in the presence of image distortion. These findings demonstrate the potential of large-scale mapping and monitoring of tree height, as well as potential biomass estimation, using NAIP imagery. △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: 29 pages, 9 figures, submitted to Remote Sensing in Ecology and Conservation (RSEC)

MSC Class: 92-08 ACM Class: I.4.9; I.5.4

arXiv:2211.09806 [pdf, other]

Mapping Tropical Forest Cover and Deforestation with Planet NICFI Satellite Images and Deep Learning in Mato Grosso State (Brazil) from 2015 to 2021

Authors: Fabien H Wagner, Ricardo Dalagnol, Celso HL Silva-Junior, Griffin Carter, Alison L Ritz, Mayumi CM Hirye, Jean PHB Ometto, Sassan Saatchi

Abstract: Monitoring changes in tree cover for rapid assessment of deforestation is considered the critical component of any climate mitigation policy for reducing carbon. Here, we map tropical tree cover and deforestation between 2015 and 2022 using 5 m spatial resolution Planet NICFI satellite images over the state of Mato Grosso (MT) in Brazil and a U-net deep learning model. The tree cover for the state… ▽ More Monitoring changes in tree cover for rapid assessment of deforestation is considered the critical component of any climate mitigation policy for reducing carbon. Here, we map tropical tree cover and deforestation between 2015 and 2022 using 5 m spatial resolution Planet NICFI satellite images over the state of Mato Grosso (MT) in Brazil and a U-net deep learning model. The tree cover for the state was 556510.8 km$^2$ in 2015 (58.1 % of the MT State) and was reduced to 141598.5 km$^2$ (14.8 % of total area) at the end of 2021. After reaching a minimum deforested area in December 2016 with 6632.05 km$^2$, the bi-annual deforestation area only showed a slight increase between December 2016 and December 2019. A year after, the areas of deforestation almost doubled from 9944.5 km$^2$ in December 2019 to 19817.8 km$^2$ in December 2021. The high-resolution data product showed relatively consistent agreement with the official deforestation map from Brazil (67.2%) but deviated significantly from year of forest cover loss estimates from the Global Forest change (GFC) product, mainly due to large area of fire degradation observed in the GFC data. High-resolution imagery from Planet NICFI associated with deep learning technics can significantly improve mapping deforestation extent in tropics. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: 18 pages, 10 figures, submitted to Remote Sensing MDPI, Special Issue "Remote Sensing of the Amazon Region"

arXiv:2207.09155 [pdf, other]

PaMILO: A Solver for Multi-Objective Mixed Integer Linear Optimization and Beyond

Authors: Fritz Bökler, Levin Nemesch, Mirko H. Wagner

Abstract: In multi-objective optimization, several potentially conflicting objective functions need to be optimized. Instead of one optimal solution, we look for the set of so called non-dominated solutions. An important subset is the set of non-dominated extreme points. Finding it is a computationally hard problem in general. While solvers for similar problems exist, there are none known for multi-object… ▽ More In multi-objective optimization, several potentially conflicting objective functions need to be optimized. Instead of one optimal solution, we look for the set of so called non-dominated solutions. An important subset is the set of non-dominated extreme points. Finding it is a computationally hard problem in general. While solvers for similar problems exist, there are none known for multi-objective mixed integer linear programs (MOMILPs) or multi-objective mixed integer quadratically constrained quadratic programs (MOMIQCQPs). We present PaMILO, the first solver for finding non-dominated extreme points of MOMILPs and MOMIQCQPs. It can be found on github under github.com/FritzBo/PaMILO. PaMILO provides an easy-to-use interface and is implemented in C++17. It solves occurring subproblems employing either CPLEX or Gurobi. PaMILO adapts the Dual-Benson algorithm for multi-objective linear programming (MOLP). As it was previously only defined for MOLPs, we describe how it can be adapted for MOMILPs, MOMIQCQPs and even more problem classes in the future. △ Less

Submitted 21 April, 2023; v1 submitted 19 July, 2022; originally announced July 2022.

arXiv:2205.08671 [pdf, other]

K-textures, a self-supervised hard clustering deep learning algorithm for satellite image segmentation

Authors: Fabien H. Wagner, Ricardo Dalagnol, Alber H. Sánchez, Mayumi C. M. Hirye, Samuel Favrichon, Jake H. Lee, Steffen Mauceri, Yan Yang, Sassan Saatchi

Abstract: Deep learning self-supervised algorithms that can segment an image in a fixed number of hard labels such as the k-means algorithm and relying only on deep learning techniques are still lacking. Here, we introduce the k-textures algorithm which provides self-supervised segmentation of a 4-band image (RGB-NIR) for a $k$ number of classes. An example of its application on high resolution Planet satel… ▽ More Deep learning self-supervised algorithms that can segment an image in a fixed number of hard labels such as the k-means algorithm and relying only on deep learning techniques are still lacking. Here, we introduce the k-textures algorithm which provides self-supervised segmentation of a 4-band image (RGB-NIR) for a $k$ number of classes. An example of its application on high resolution Planet satellite imagery is given. Our algorithm shows that discrete search is feasible using convolutional neural networks (CNN) and gradient descent. The model detects $k$ hard clustering classes represented in the model as $k$ discrete binary masks and their associated $k$ independently generated textures, that combined are a simulation of the original image. The similarity loss is the mean squared error between the features of the original and the simulated image, both extracted from the penultimate convolutional block of Keras 'imagenet' pretrained VGG-16 model and a custom feature extractor made with Planet data. The main advances of the k-textures model are: first, the $k$ discrete binary masks are obtained inside the model using gradient descent. The model allows for the generation of discrete binary masks using a novel method using a hard sigmoid activation function. Second, it provides hard clustering classes -- each pixels has only one class. Finally, in comparison to k-means, where each pixel is considered independently, here, contextual information is also considered and each class is not associated only to similar values in the color channels but also to a texture. Our approach is designed to ease the production of training samples for satellite image segmentation and the k-textures architecture could be adapted to support different number of bands and for more complex tasks, such as object self-segmentation. The model codes and weights are available at https://doi.org/10.5281/zenodo.6359859 △ Less

Submitted 27 May, 2022; v1 submitted 17 May, 2022; originally announced May 2022.

Comments: 19 pages, 10 figures, submitted to Frontiers in Environmental Science, section Environmental Informatics and Remote Sensing, Research Topic: Advances in Machine Learning and Deep Learning for Monitoring Terrestrial Ecosystems

ACM Class: I.4.6; I.5.3

arXiv:2203.09087 [pdf, other]

GPU Computation of the Euler Characteristic Curve for Imaging Data

Authors: Fan Wang, Hubert Wagner, Chao Chen

Abstract: Persistent homology is perhaps the most popular and useful tool offered by topological data analysis, with point-cloud data being the most common setup. Its older cousin, the Euler characteristic curve (ECC) is less expressive, but far easier to compute. It is particularly suitable for analyzing imaging data, and is commonly used in fields ranging from astrophysics to biomedical image analysis. Th… ▽ More Persistent homology is perhaps the most popular and useful tool offered by topological data analysis, with point-cloud data being the most common setup. Its older cousin, the Euler characteristic curve (ECC) is less expressive, but far easier to compute. It is particularly suitable for analyzing imaging data, and is commonly used in fields ranging from astrophysics to biomedical image analysis. These fields are embracing GPU computations to handle increasingly large datasets. We therefore propose an optimized GPU implementation of ECC computation for 2D and 3D grayscale images. The goal of this paper is twofold. First, we offer a practical tool, illustrating its performance with thorough experimentation, but also explain its inherent shortcomings. Second, this simple algorithm serves as a perfect backdrop for highlighting basic GPU programming techniques that make our implementation so efficient, and some common pitfalls we avoided. This is intended as a step towards a wider usage of GPU programming in computational geometry and topology software. We find this is particularly important as geometric and topological tools are used in conjunction with modern, GPU-accelerated machine learning frameworks. △ Less

Submitted 3 March, 2023; v1 submitted 17 March, 2022; originally announced March 2022.

Comments: 17pages, 7 figures, SoCG2022

arXiv:2112.07051 [pdf]

doi 10.1093/database/baac035

A Simple Standard for Sharing Ontological Mappings (SSSOM)

Authors: Nicolas Matentzoglu, James P. Balhoff, Susan M. Bello, Chris Bizon, Matthew Brush, Tiffany J. Callahan, Christopher G Chute, William D. Duncan, Chris T. Evelo, Davera Gabriel, John Graybeal, Alasdair Gray, Benjamin M. Gyori, Melissa Haendel, Henriette Harmse, Nomi L. Harris, Ian Harrow, Harshad Hegde, Amelia L. Hoyt, Charles T. Hoyt, Dazhi Jiao, Ernesto Jiménez-Ruiz, Simon Jupp, Hyeongsik Kim, Sebastian Koehler , et al. (19 additional authors not shown)

Abstract: Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, ar… ▽ More Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Are they associated in some other way? Such relationships between the mapped terms are often not documented, leading to incorrect assumptions and making them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Also, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. The Simple Standard for Sharing Ontological Mappings (SSSOM) addresses these problems by: 1. Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. 2. Defining an easy to use table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data standards. 3. Implementing open and community-driven collaborative workflows designed to evolve the standard continuously to address changing requirements and mapping practices. 4. Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases, and survey some existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable, and Reusable (FAIR). The SSSOM specification is at http://w3id.org/sssom/spec. △ Less

Submitted 13 December, 2021; originally announced December 2021.

Comments: Corresponding author: Christopher J. Mungall <[email protected]>

arXiv:2112.04854 [pdf, other]

Properties of Large 2-Crossing-Critical Graphs

Authors: Drago Bokal, Markus Chimani, Alexander Nover, Jöran Schierbaum, Tobias Stolzmann, Mirko H. Wagner, Tilo Wiedera

Abstract: A $c$-crossing-critical graph is one that has crossing number at least $c$ but each of its proper subgraphs has crossing number less than $c$. Recently, a set of explicit construction rules was identified by Bokal, Oporowski, Richter, and Salazar to generate all large $2$-crossing-critical graphs (i.e., all apart from a finite set of small sporadic graphs). They share the property of containing a… ▽ More A $c$-crossing-critical graph is one that has crossing number at least $c$ but each of its proper subgraphs has crossing number less than $c$. Recently, a set of explicit construction rules was identified by Bokal, Oporowski, Richter, and Salazar to generate all large $2$-crossing-critical graphs (i.e., all apart from a finite set of small sporadic graphs). They share the property of containing a generalized Wagner graph $V_{10}$ as a subdivision. In this paper, we study these graphs and establish their order, simple crossing number, edge cover number, clique number, maximum degree, chromatic number, chromatic index, and treewidth. We also show that the graphs are linear-time recognizable and that all our proofs lead to efficient algorithms for the above measures. △ Less

Submitted 9 December, 2021; originally announced December 2021.

Comments: 29 pages, 14 figures

arXiv:2106.06469 [pdf, other]

Topological Detection of Trojaned Neural Networks

Authors: Songzhu Zheng, Yikai Zhang, Hubert Wagner, Mayank Goswami, Chao Chen

Abstract: Deep neural networks are known to have security issues. One particular threat is the Trojan attack. It occurs when the attackers stealthily manipulate the model's behavior through Trojaned training samples, which can later be exploited. Guided by basic neuroscientific principles we discover subtle -- yet critical -- structural deviation characterizing Trojaned models. In our analysis we use topo… ▽ More Deep neural networks are known to have security issues. One particular threat is the Trojan attack. It occurs when the attackers stealthily manipulate the model's behavior through Trojaned training samples, which can later be exploited. Guided by basic neuroscientific principles we discover subtle -- yet critical -- structural deviation characterizing Trojaned models. In our analysis we use topological tools. They allow us to model high-order dependencies in the networks, robustly compare different networks, and localize structural abnormalities. One interesting observation is that Trojaned models develop short-cuts from input to output layers. Inspired by these observations, we devise a strategy for robust detection of Trojaned models. Compared to standard baselines it displays better performance on multiple benchmarks. △ Less

Submitted 11 June, 2021; originally announced June 2021.

arXiv:2002.07012 [pdf, other]

doi 10.1007/978-3-030-53262-8_8

An Experimental Study of ILP Formulations for the Longest Induced Path Problem

Authors: Fritz Bökler, Markus Chimani, Mirko H. Wagner, Tilo Wiedera

Abstract: Given a graph $G=(V,E)$, the longest induced path problem asks for a maximum cardinality node subset $W\subseteq V$ such that the graph induced by $W$ is a path. It is a long established problem with applications, e.g., in network analysis. We propose novel integer linear programming (ILP) formulations for the problem and discuss efficient implementations thereof. Comparing them with known formula… ▽ More Given a graph $G=(V,E)$, the longest induced path problem asks for a maximum cardinality node subset $W\subseteq V$ such that the graph induced by $W$ is a path. It is a long established problem with applications, e.g., in network analysis. We propose novel integer linear programming (ILP) formulations for the problem and discuss efficient implementations thereof. Comparing them with known formulations from literature, we prove that they are beneficial in theory, yielding stronger relaxations. Moreover, our experiments show their practical superiority. △ Less

Submitted 17 October, 2020; v1 submitted 17 February, 2020; originally announced February 2020.

arXiv:1903.08510 [pdf, other]

Topological Data Analysis in Information Space

Authors: Herbert Edelsbrunner, Ziga Virk, Hubert Wagner

Abstract: Various kinds of data are routinely represented as discrete probability distributions. Examples include text documents summarized by histograms of word occurrences and images represented as histograms of oriented gradients. Viewing a discrete probability distribution as a point in the standard simplex of the appropriate dimension, we can understand collections of such objects in geometric and topo… ▽ More Various kinds of data are routinely represented as discrete probability distributions. Examples include text documents summarized by histograms of word occurrences and images represented as histograms of oriented gradients. Viewing a discrete probability distribution as a point in the standard simplex of the appropriate dimension, we can understand collections of such objects in geometric and topological terms. Importantly, instead of using the standard Euclidean distance, we look into dissimilarity measures with information-theoretic justification, and we develop the theory needed for applying topological data analysis in this setting. In doing so, we emphasize constructions that enable usage of existing computational topology software in this context. △ Less

Submitted 28 March, 2019; v1 submitted 20 March, 2019; originally announced March 2019.

Journal ref: Full version of the paper published in proceedings of the 35th International Symposium on Computational Geometry (SoCG 2019)

arXiv:1705.02045 [pdf, other]

doi 10.1007/978-3-319-64689-3_32

Streaming Algorithm for Euler Characteristic Curves of Multidimensional Images

Authors: Teresa Heiss, Hubert Wagner

Abstract: We present an efficient algorithm to compute Euler characteristic curves of gray scale images of arbitrary dimension. In various applications the Euler characteristic curve is used as a descriptor of an image. Our algorithm is the first streaming algorithm for Euler characteristic curves. The usage of streaming removes the necessity to store the entire image in RAM. Experiments show that our imp… ▽ More We present an efficient algorithm to compute Euler characteristic curves of gray scale images of arbitrary dimension. In various applications the Euler characteristic curve is used as a descriptor of an image. Our algorithm is the first streaming algorithm for Euler characteristic curves. The usage of streaming removes the necessity to store the entire image in RAM. Experiments show that our implementation handles terabyte scale images on commodity hardware. Due to lock-free parallelism, it scales well with the number of processor cores. Our software---CHUNKYEuler---is available as open source on Bitbucket. Additionally, we put the concept of the Euler characteristic curve in the wider context of computational topology. In particular, we explain the connection with persistence diagrams. △ Less

Submitted 17 October, 2018; v1 submitted 4 May, 2017; originally announced May 2017.

Journal ref: In: Computer Analysis of Images and Patterns. CAIP 2017. Lecture Notes in Computer Science, vol 10424. Springer International Publishing, Cham, 2017, pp. 397-409

arXiv:1607.06344 [pdf, other]

Solving equations and optimization problems with uncertainty

Authors: Peter Franek, Marek Krčál, Hubert Wagner

Abstract: We study the problem of detecting zeros of continuous functions that are known only up to an error bound, extending the earlier theoretical work with explicit algorithms and experiments with an implementation. More formally, the robustness of zero of a continuous map $f: X\to \mathbb{R}^n$ is the maximal $r>0$ such that each $g:X\to\mathbb{R}^n$ with $\|f-g\|_\infty\le r$ has a zero. We develop an… ▽ More We study the problem of detecting zeros of continuous functions that are known only up to an error bound, extending the earlier theoretical work with explicit algorithms and experiments with an implementation. More formally, the robustness of zero of a continuous map $f: X\to \mathbb{R}^n$ is the maximal $r>0$ such that each $g:X\to\mathbb{R}^n$ with $\|f-g\|_\infty\le r$ has a zero. We develop and implement an efficient algorithm approximating the robustness of zero. Further, we show how to use the algorithm for approximating worst-case optima in optimization problems in which the feasible domain is defined by equations that are only known approximately. An important ingredient is an algorithm for deciding the topological extension problem based on computing cohomological obstructions to extendability and their persistence. We describe an explicit algorithm for the primary and secondary obstruction, two stages of a sequence of algorithms with increasing complexity. We provide experimental evidence that for random Gaussian fields, the primary obstruction---a much less computationally demanding test than the secondary obstruction---is typically sufficient for approximating robustness of zero. △ Less

Submitted 27 September, 2017; v1 submitted 21 July, 2016; originally announced July 2016.

MSC Class: 65H10; 45F99 ACM Class: G.1.5; F.2.2

arXiv:1607.06274 [pdf, other]

Topological Data Analysis with Bregman Divergences

Authors: Herbert Edelsbrunner, Hubert Wagner

Abstract: Given a finite set in a metric space, the topological analysis generalizes hierarchical clustering using a 1-parameter family of homology groups to quantify connectivity in all dimensions. The connectivity is compactly described by the persistence diagram. One limitation of the current framework is the reliance on metric distances, whereas in many practical applications objects are compared by non… ▽ More Given a finite set in a metric space, the topological analysis generalizes hierarchical clustering using a 1-parameter family of homology groups to quantify connectivity in all dimensions. The connectivity is compactly described by the persistence diagram. One limitation of the current framework is the reliance on metric distances, whereas in many practical applications objects are compared by non-metric dissimilarity measures. Examples are the Kullback-Leibler divergence, which is commonly used for comparing text and images, and the Itakura-Saito divergence, popular for speech and sound. These are two members of the broad family of dissimilarities called Bregman divergences. We show that the framework of topological data analysis can be extended to general Bregman divergences, widening the scope of possible applications. In particular, we prove that appropriately generalized Cech and Delaunay (alpha) complexes capture the correct homotopy type, namely that of the corresponding union of Bregman balls. Consequently, their filtrations give the correct persistence diagram, namely the one generated by the uniformly growing Bregman balls. Moreover, we show that unlike the metric setting, the filtration of Vietoris-Rips complexes may fail to approximate the persistence diagram. We propose algorithms to compute the thus generalized Cech, Vietoris-Rips and Delaunay complexes and experimentally test their efficiency. Lastly, we explain their surprisingly good performance by making a connection with discrete Morse theory. △ Less

Submitted 21 July, 2016; originally announced July 2016.

arXiv:1210.1429 [pdf, other]

Computing homology and persistent homology using iterated Morse decomposition

Authors: Paweł Dłotko, Hubert Wagner

Abstract: In this paper we present a new approach to computing homology (with field coefficients) and persistent homology. We use concepts from discrete Morse theory, to provide an algorithm which can be expressed solely in terms of simple graph theoretical operations. We use iterated Morse decomposition, which allows us to sidetrack many problems related to the standard discrete Morse theory. In particular… ▽ More In this paper we present a new approach to computing homology (with field coefficients) and persistent homology. We use concepts from discrete Morse theory, to provide an algorithm which can be expressed solely in terms of simple graph theoretical operations. We use iterated Morse decomposition, which allows us to sidetrack many problems related to the standard discrete Morse theory. In particular, this approach is provably correct in any dimension. △ Less

Submitted 25 October, 2012; v1 submitted 4 October, 2012; originally announced October 2012.

Showing 1–24 of 24 results for author: Wagner, H