-
Attending to Topological Spaces: The Cellular Transformer
Authors:
Rubén Ballester,
Pablo Hernández-García,
Mathilde Papillon,
Claudio Battiloro,
Nina Miolane,
Tolga Birdal,
Carles Casacuberta,
Sergio Escalera,
Mustafa Hajij
Abstract:
Topological Deep Learning seeks to enhance the predictive performance of neural network models by harnessing topological structures in input data. Topological neural networks operate on spaces such as cell complexes and hypergraphs, that can be seen as generalizations of graphs. In this work, we introduce the Cellular Transformer (CT), a novel architecture that generalizes graph-based transformers…
▽ More
Topological Deep Learning seeks to enhance the predictive performance of neural network models by harnessing topological structures in input data. Topological neural networks operate on spaces such as cell complexes and hypergraphs, that can be seen as generalizations of graphs. In this work, we introduce the Cellular Transformer (CT), a novel architecture that generalizes graph-based transformers to cell complexes. First, we propose a new formulation of the usual self- and cross-attention mechanisms, tailored to leverage incidence relations in cell complexes, e.g., edge-face and node-edge relations. Additionally, we propose a set of topological positional encodings specifically designed for cell complexes. By transforming three graph datasets into cell complex datasets, our experiments reveal that CT not only achieves state-of-the-art performance, but it does so without the need for more complex enhancements such as virtual nodes, in-domain structural encodings, or graph rewiring.
△ Less
Submitted 26 May, 2024; v1 submitted 22 May, 2024;
originally announced May 2024.
-
Position: Topological Deep Learning is the New Frontier for Relational Learning
Authors:
Theodore Papamarkou,
Tolga Birdal,
Michael Bronstein,
Gunnar Carlsson,
Justin Curry,
Yue Gao,
Mustafa Hajij,
Roland Kwitt,
Pietro Liò,
Paolo Di Lorenzo,
Vasileios Maroulas,
Nina Miolane,
Farzana Nasrin,
Karthikeyan Natesan Ramamurthy,
Bastian Rieck,
Simone Scardapane,
Michael T. Schaub,
Petar Veličković,
Bei Wang,
Yusu Wang,
Guo-Wei Wei,
Ghada Zamzmi
Abstract:
Topological deep learning (TDL) is a rapidly evolving field that uses topological features to understand and design deep learning models. This paper posits that TDL is the new frontier for relational learning. TDL may complement graph representation learning and geometric deep learning by incorporating topological concepts, and can thus provide a natural choice for various machine learning setting…
▽ More
Topological deep learning (TDL) is a rapidly evolving field that uses topological features to understand and design deep learning models. This paper posits that TDL is the new frontier for relational learning. TDL may complement graph representation learning and geometric deep learning by incorporating topological concepts, and can thus provide a natural choice for various machine learning settings. To this end, this paper discusses open problems in TDL, ranging from practical benefits to theoretical foundations. For each problem, it outlines potential solutions and future research opportunities. At the same time, this paper serves as an invitation to the scientific community to actively participate in TDL research to unlock the potential of this emerging field.
△ Less
Submitted 6 August, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
TopoX: A Suite of Python Packages for Machine Learning on Topological Domains
Authors:
Mustafa Hajij,
Mathilde Papillon,
Florian Frantzen,
Jens Agerberg,
Ibrahem AlJabea,
Rubén Ballester,
Claudio Battiloro,
Guillermo Bernárdez,
Tolga Birdal,
Aiden Brent,
Peter Chin,
Sergio Escalera,
Simone Fiorellino,
Odin Hoff Gardaa,
Gurusankar Gopalakrishnan,
Devendra Govil,
Josef Hoppe,
Maneel Reddy Karri,
Jude Khouja,
Manuel Lecha,
Neal Livesay,
Jan Meißner,
Soham Mukherjee,
Alexander Nikitin,
Theodore Papamarkou
, et al. (18 additional authors not shown)
Abstract:
We introduce TopoX, a Python software suite that provides reliable and user-friendly building blocks for computing and machine learning on topological domains that extend graphs: hypergraphs, simplicial, cellular, path and combinatorial complexes. TopoX consists of three packages: TopoNetX facilitates constructing and computing on these domains, including working with nodes, edges and higher-order…
▽ More
We introduce TopoX, a Python software suite that provides reliable and user-friendly building blocks for computing and machine learning on topological domains that extend graphs: hypergraphs, simplicial, cellular, path and combinatorial complexes. TopoX consists of three packages: TopoNetX facilitates constructing and computing on these domains, including working with nodes, edges and higher-order cells; TopoEmbedX provides methods to embed topological domains into vector spaces, akin to popular graph-based embedding algorithms such as node2vec; TopoModelX is built on top of PyTorch and offers a comprehensive toolbox of higher-order message passing functions for neural networks on topological domains. The extensively documented and unit-tested source code of TopoX is available under MIT license at https://pyt-team.github.io/}{https://pyt-team.github.io/.
△ Less
Submitted 8 December, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
Topo-MLP : A Simplicial Network Without Message Passing
Authors:
Karthikeyan Natesan Ramamurthy,
Aldo Guzmán-Sáenz,
Mustafa Hajij
Abstract:
Due to their ability to model meaningful higher order relations among a set of entities, higher order network models have emerged recently as a powerful alternative for graph-based network models which are only capable of modeling binary relationships. Message passing paradigm is still dominantly used to learn representations even for higher order network models. While powerful, message passing ca…
▽ More
Due to their ability to model meaningful higher order relations among a set of entities, higher order network models have emerged recently as a powerful alternative for graph-based network models which are only capable of modeling binary relationships. Message passing paradigm is still dominantly used to learn representations even for higher order network models. While powerful, message passing can have disadvantages during inference, particularly when the higher order connectivity information is missing or corrupted. To overcome such limitations, we propose Topo-MLP, a purely MLP-based simplicial neural network algorithm to learn the representation of elements in a simplicial complex without explicitly relying on message passing. Our framework utilizes a novel Higher Order Neighborhood Contrastive (HONC) loss which implicitly incorporates the simplicial structure into representation learning. Our proposed model's simplicity makes it faster during inference. Moreover, we show that our model is robust when faced with missing or corrupted connectivity structure.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Combinatorial Complexes: Bridging the Gap Between Cell Complexes and Hypergraphs
Authors:
Mustafa Hajij,
Ghada Zamzmi,
Theodore Papamarkou,
Aldo Guzmán-Sáenz,
Tolga Birdal,
Michael T. Schaub
Abstract:
Graph-based signal processing techniques have become essential for handling data in non-Euclidean spaces. However, there is a growing awareness that these graph models might need to be expanded into `higher-order' domains to effectively represent the complex relations found in high-dimensional data. Such higher-order domains are typically modeled either as hypergraphs, or as simplicial, cubical or…
▽ More
Graph-based signal processing techniques have become essential for handling data in non-Euclidean spaces. However, there is a growing awareness that these graph models might need to be expanded into `higher-order' domains to effectively represent the complex relations found in high-dimensional data. Such higher-order domains are typically modeled either as hypergraphs, or as simplicial, cubical or other cell complexes. In this context, cell complexes are often seen as a subclass of hypergraphs with additional algebraic structure that can be exploited, e.g., to develop a spectral theory. In this article, we promote an alternative perspective. We argue that hypergraphs and cell complexes emphasize \emph{different} types of relations, which may have different utility depending on the application context. Whereas hypergraphs are effective in modeling set-type, multi-body relations between entities, cell complexes provide an effective means to model hierarchical, interior-to-boundary type relations. We discuss the relative advantages of these two choices and elaborate on the previously introduced concept of a combinatorial complex that enables co-existing set-type and hierarchical relations. Finally, we provide a brief numerical experiment to demonstrate that this modelling flexibility can be advantageous in learning tasks.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
Topological Deep Learning: Going Beyond Graph Data
Authors:
Mustafa Hajij,
Ghada Zamzmi,
Theodore Papamarkou,
Nina Miolane,
Aldo Guzmán-Sáenz,
Karthikeyan Natesan Ramamurthy,
Tolga Birdal,
Tamal K. Dey,
Soham Mukherjee,
Shreyas N. Samaga,
Neal Livesay,
Robin Walters,
Paul Rosen,
Michael T. Schaub
Abstract:
Topological deep learning is a rapidly growing field that pertains to the development of deep learning models for data supported on topological domains such as simplicial complexes, cell complexes, and hypergraphs, which generalize many domains encountered in scientific computations. In this paper, we present a unifying deep learning framework built upon a richer data structure that includes widel…
▽ More
Topological deep learning is a rapidly growing field that pertains to the development of deep learning models for data supported on topological domains such as simplicial complexes, cell complexes, and hypergraphs, which generalize many domains encountered in scientific computations. In this paper, we present a unifying deep learning framework built upon a richer data structure that includes widely adopted topological domains.
Specifically, we first introduce combinatorial complexes, a novel type of topological domain. Combinatorial complexes can be seen as generalizations of graphs that maintain certain desirable properties. Similar to hypergraphs, combinatorial complexes impose no constraints on the set of relations. In addition, combinatorial complexes permit the construction of hierarchical higher-order relations, analogous to those found in simplicial and cell complexes. Thus, combinatorial complexes generalize and combine useful traits of both hypergraphs and cell complexes, which have emerged as two promising abstractions that facilitate the generalization of graph neural networks to topological spaces.
Second, building upon combinatorial complexes and their rich combinatorial and algebraic structure, we develop a general class of message-passing combinatorial complex neural networks (CCNNs), focusing primarily on attention-based CCNNs. We characterize permutation and orientation equivariances of CCNNs, and discuss pooling and unpooling operations within CCNNs in detail.
Third, we evaluate the performance of CCNNs on tasks related to mesh shape analysis and graph learning. Our experiments demonstrate that CCNNs have competitive performance as compared to state-of-the-art deep learning models specifically tailored to the same tasks. Our findings demonstrate the advantages of incorporating higher-order relations into deep learning models in different applications.
△ Less
Submitted 19 May, 2023; v1 submitted 1 June, 2022;
originally announced June 2022.
-
Data-Centric AI Requires Rethinking Data Notion
Authors:
Mustafa Hajij,
Ghada Zamzmi,
Karthikeyan Natesan Ramamurthy,
Aldo Guzman Saenz
Abstract:
The transition towards data-centric AI requires revisiting data notions from mathematical and implementational standpoints to obtain unified data-centric machine learning packages. Towards this end, this work proposes unifying principles offered by categorical and cochain notions of data, and discusses the importance of these principles in data-centric AI transition. In the categorical notion, dat…
▽ More
The transition towards data-centric AI requires revisiting data notions from mathematical and implementational standpoints to obtain unified data-centric machine learning packages. Towards this end, this work proposes unifying principles offered by categorical and cochain notions of data, and discusses the importance of these principles in data-centric AI transition. In the categorical notion, data is viewed as a mathematical structure that we act upon via morphisms to preserve this structure. As for cochain notion, data can be viewed as a function defined in a discrete domain of interest and acted upon via operators. While these notions are almost orthogonal, they provide a unifying definition to view data, ultimately impacting the way machine learning packages are developed, implemented, and utilized by practitioners.
△ Less
Submitted 2 December, 2021; v1 submitted 6 October, 2021;
originally announced October 2021.
-
Simplicial Complex Representation Learning
Authors:
Mustafa Hajij,
Ghada Zamzmi,
Theodore Papamarkou,
Vasileios Maroulas,
Xuanting Cai
Abstract:
Simplicial complexes form an important class of topological spaces that are frequently used in many application areas such as computer-aided design, computer graphics, and simulation. Representation learning on graphs, which are just 1-d simplicial complexes, has witnessed a great attention in recent years. However, there has not been enough effort to extend representation learning to higher dimen…
▽ More
Simplicial complexes form an important class of topological spaces that are frequently used in many application areas such as computer-aided design, computer graphics, and simulation. Representation learning on graphs, which are just 1-d simplicial complexes, has witnessed a great attention in recent years. However, there has not been enough effort to extend representation learning to higher dimensional simplicial objects due to the additional complexity these objects hold, especially when it comes to entire-simplicial complex representation learning. In this work, we propose a method for simplicial complex-level representation learning that embeds a simplicial complex to a universal embedding space in a way that complex-to-complex proximity is preserved. Our method uses our novel geometric message passing schemes to learn an entire simplicial complex representation in an end-to-end fashion. We demonstrate the proposed model on publicly available mesh dataset. To the best of our knowledge, this work presents the first method for learning simplicial complex-level representation.
△ Less
Submitted 1 February, 2022; v1 submitted 6 March, 2021;
originally announced March 2021.
-
Topological Deep Learning: Classification Neural Networks
Authors:
Mustafa Hajij,
Kyle Istvan
Abstract:
Topological deep learning is a formalism that is aimed at introducing topological language to deep learning for the purpose of utilizing the minimal mathematical structures to formalize problems that arise in a generic deep learning problem. This is the first of a sequence of articles with the purpose of introducing and studying this formalism. In this article, we define and study the classificati…
▽ More
Topological deep learning is a formalism that is aimed at introducing topological language to deep learning for the purpose of utilizing the minimal mathematical structures to formalize problems that arise in a generic deep learning problem. This is the first of a sequence of articles with the purpose of introducing and studying this formalism. In this article, we define and study the classification problem in machine learning in a topological setting. Using this topological framework, we show when the classification problem is possible or not possible in the context of neural networks. Finally, we demonstrate how our topological setting immediately illuminates aspects of this problem that are not as readily apparent using traditional tools.
△ Less
Submitted 16 February, 2021;
originally announced February 2021.
-
Cell Complex Neural Networks
Authors:
Mustafa Hajij,
Kyle Istvan,
Ghada Zamzmi
Abstract:
Cell complexes are topological spaces constructed from simple blocks called cells. They generalize graphs, simplicial complexes, and polyhedral complexes that form important domains for practical applications. They also provide a combinatorial formalism that allows the inclusion of complicated relationships of restrictive structures such as graphs and meshes. In this paper, we propose \textbf{Cell…
▽ More
Cell complexes are topological spaces constructed from simple blocks called cells. They generalize graphs, simplicial complexes, and polyhedral complexes that form important domains for practical applications. They also provide a combinatorial formalism that allows the inclusion of complicated relationships of restrictive structures such as graphs and meshes. In this paper, we propose \textbf{Cell Complexes Neural Networks (CXNs)}, a general, combinatorial and unifying construction for performing neural network-type computations on cell complexes. We introduce an inter-cellular message passing scheme on cell complexes that takes the topology of the underlying space into account and generalizes message passing scheme to graphs. Finally, we introduce a unified cell complex encoder-decoder framework that enables learning representation of cells for a given complex inside the Euclidean spaces. In particular, we show how our cell complex autoencoder construction can give, in the special case \textbf{cell2vec}, a generalization for node2vec.
△ Less
Submitted 1 March, 2021; v1 submitted 1 October, 2020;
originally announced October 2020.
-
A Topological Framework for Deep Learning
Authors:
Mustafa Hajij,
Kyle Istvan
Abstract:
We utilize classical facts from topology to show that the classification problem in machine learning is always solvable under very mild conditions. Furthermore, we show that a softmax classification network acts on an input topological space by a finite sequence of topological moves to achieve the classification task. Moreover, given a training dataset, we show how topological formalism can be use…
▽ More
We utilize classical facts from topology to show that the classification problem in machine learning is always solvable under very mild conditions. Furthermore, we show that a softmax classification network acts on an input topological space by a finite sequence of topological moves to achieve the classification task. Moreover, given a training dataset, we show how topological formalism can be used to suggest the appropriate architectural choices for neural networks designed to be trained as classifiers on the data. Finally, we show how the architecture of a neural network cannot be chosen independently from the shape of the underlying data. To demonstrate these results, we provide example datasets and show how they are acted upon by neural nets from this topological perspective.
△ Less
Submitted 21 June, 2021; v1 submitted 31 August, 2020;
originally announced August 2020.
-
PageRank and The K-Means Clustering Algorithm
Authors:
Mustafa Hajij,
Eyad Said,
Robert Todd
Abstract:
We utilize the PageRank vector to generalize the $k$-means clustering algorithm to directed and undirected graphs. We demonstrate that PageRank and other centrality measures can be used in our setting to robustly compute centrality of nodes in a given graph. Furthermore, we show how our method can be generalized to metric spaces and apply it to other domains such as point clouds and triangulated m…
▽ More
We utilize the PageRank vector to generalize the $k$-means clustering algorithm to directed and undirected graphs. We demonstrate that PageRank and other centrality measures can be used in our setting to robustly compute centrality of nodes in a given graph. Furthermore, we show how our method can be generalized to metric spaces and apply it to other domains such as point clouds and triangulated meshes
△ Less
Submitted 9 March, 2021; v1 submitted 10 May, 2020;
originally announced May 2020.
-
Fast and Scalable Complex Network Descriptor Using PageRank and Persistent Homology
Authors:
Mustafa Hajij,
Elizabeth Munch,
Paul Rosen
Abstract:
The PageRank of a graph is a scalar function defined on the node set of the graph which encodes nodes centrality information of the graph. In this article, we use the PageRank function along with persistent homology to obtain a scalable graph descriptor and utilize it to compare the similarities between graphs. For a given graph $G(V,E)$, our descriptor can be computed in $O(|E|α(|V|))$, where…
▽ More
The PageRank of a graph is a scalar function defined on the node set of the graph which encodes nodes centrality information of the graph. In this article, we use the PageRank function along with persistent homology to obtain a scalable graph descriptor and utilize it to compare the similarities between graphs. For a given graph $G(V,E)$, our descriptor can be computed in $O(|E|α(|V|))$, where $α$ is the inverse Ackermann function which makes it scalable and computable on massive graphs. We show the effectiveness of our method by utilizing it on multiple shape mesh datasets.
△ Less
Submitted 11 September, 2020; v1 submitted 12 February, 2020;
originally announced February 2020.
-
Mesh Learning Using Persistent Homology on the Laplacian Eigenfunctions
Authors:
Yunhao Zhang,
Haowen Liu,
Paul Rosen,
Mustafa Hajij
Abstract:
We use persistent homology along with the eigenfunctions of the Laplacian to study similarity amongst triangulated 2-manifolds. Our method relies on studying the lower-star filtration induced by the eigenfunctions of the Laplacian. This gives us a shape descriptor that inherits the rich information encoded in the eigenfunctions of the Laplacian. Moreover, the similarity between these descriptors c…
▽ More
We use persistent homology along with the eigenfunctions of the Laplacian to study similarity amongst triangulated 2-manifolds. Our method relies on studying the lower-star filtration induced by the eigenfunctions of the Laplacian. This gives us a shape descriptor that inherits the rich information encoded in the eigenfunctions of the Laplacian. Moreover, the similarity between these descriptors can be easily computed using tools that are readily available in Topological Data Analysis. We provide experiments to illustrate the effectiveness of the proposed method.
△ Less
Submitted 23 April, 2019; v1 submitted 21 April, 2019;
originally announced April 2019.
-
Integrating Project Spatial Coordinates into Pavement Management Prioritization
Authors:
Shadi Hanandeh,
Omar Elbagalati,
Mustafa Hajij
Abstract:
To date, pavement management software products and studies on optimizing the prioritization of pavement maintenance and rehabilitation (M&R) have been mainly focused on three parameters; the pre-treatment pavement condition, the rehabilitation cost, and the available budget. Yet, the role of the candidate projects' spatial characteristics in the decision-making process has not been deeply consider…
▽ More
To date, pavement management software products and studies on optimizing the prioritization of pavement maintenance and rehabilitation (M&R) have been mainly focused on three parameters; the pre-treatment pavement condition, the rehabilitation cost, and the available budget. Yet, the role of the candidate projects' spatial characteristics in the decision-making process has not been deeply considered. Such a limitation, predominately, allows the recommended M&R projects' schedule to involve simultaneously running but spatially scattered construction sites, which are very challenging to monitor and manage. This study introduces a novel approach to integrate pavement segments' spatial coordinates into the M&R prioritization analysis. The introduced approach aims at combining the pavement segments with converged spatial coordinates to be repaired in the same timeframe without compromising the allocated budget levels or the overall target Pavement Condition Index (PCI). Such a combination would result in minimizing the routing of crews, materials and other equipment among the construction sites and would provide better collaborations and communications between the pavement maintenance teams. Proposed herein is a novel spatial clustering algorithm that automatically finds the projects within a certain budget and spatial constrains. The developed algorithm was successfully validated using 1,800 pavement maintenance projects from two real-life examples of the City of Milton, GA and the City of Tyler, TX.
△ Less
Submitted 9 June, 2025; v1 submitted 5 November, 2018;
originally announced November 2018.
-
Homology-Preserving Multi-Scale Graph Skeletonization Using Mapper on Graphs
Authors:
Paul Rosen,
Mustafa Hajij,
Bei Wang
Abstract:
Node-link diagrams are a popular method for representing graphs that capture relationships between individuals, businesses, proteins, and telecommunication endpoints. However, node-link diagrams may fail to convey insights regarding graph structures, even for moderately sized data of a few hundred nodes, due to visual clutter. We propose to apply the mapper construction -- a popular tool in topolo…
▽ More
Node-link diagrams are a popular method for representing graphs that capture relationships between individuals, businesses, proteins, and telecommunication endpoints. However, node-link diagrams may fail to convey insights regarding graph structures, even for moderately sized data of a few hundred nodes, due to visual clutter. We propose to apply the mapper construction -- a popular tool in topological data analysis -- to graph visualization, which provides a strong theoretical basis for summarizing the data while preserving their core structures. We develop a variation of the mapper construction targeting weighted, undirected graphs, called {\mog}, which generates homology-preserving skeletons of graphs. We further show how the adjustment of a single parameter enables multi-scale skeletonization of the input graph. We provide a software tool that enables interactive explorations of such skeletons and demonstrate the effectiveness of our method for synthetic and real-world data.
△ Less
Submitted 19 September, 2023; v1 submitted 3 April, 2018;
originally announced April 2018.
-
Graph Based Analysis for Gene Segment Organization In a Scrambled Genome
Authors:
Mustafa Hajij,
Nataša Jonoska,
Denys Kukushkin,
Masahico Saito
Abstract:
DNA rearrangement processes recombine gene segments that are organized on the chromosome in a variety of ways. The segments can overlap, interleave or one may be a subsegment of another. We use directed graphs to represent segment organizations on a given locus where contigs containing rearranged segments represent vertices and the edges correspond to the segment relationships. Using graph propert…
▽ More
DNA rearrangement processes recombine gene segments that are organized on the chromosome in a variety of ways. The segments can overlap, interleave or one may be a subsegment of another. We use directed graphs to represent segment organizations on a given locus where contigs containing rearranged segments represent vertices and the edges correspond to the segment relationships. Using graph properties we associate a point in a higher dimensional Euclidean space to each graph such that cluster formations and analysis can be performed with methods from topological data analysis. The method is applied to a recently sequenced model organism \textit{Oxytricha trifallax}, a species of ciliate with highly scrambled genome that undergoes massive rearrangement process after conjugation. The analysis shows some emerging star-like graph structures indicating that segments of a single gene can interleave, or even contain all of the segments from fifteen or more other genes in between its segments. We also observe that as many as six genes can have their segments mutually interleaving or overlapping.
△ Less
Submitted 28 January, 2018; v1 submitted 17 January, 2018;
originally announced January 2018.
-
Parallel Mapper
Authors:
Mustafa Hajij,
Basem Assiri,
Paul Rosen
Abstract:
The construction of Mapper has emerged in the last decade as a powerful and effective topological data analysis tool that approximates and generalizes other topological summaries, such as the Reeb graph, the contour tree, split, and joint trees. In this paper, we study the parallel analysis of the construction of Mapper. We give a provably correct parallel algorithm to execute Mapper on multiple p…
▽ More
The construction of Mapper has emerged in the last decade as a powerful and effective topological data analysis tool that approximates and generalizes other topological summaries, such as the Reeb graph, the contour tree, split, and joint trees. In this paper, we study the parallel analysis of the construction of Mapper. We give a provably correct parallel algorithm to execute Mapper on multiple processors and discuss the performance results that compare our approach to a reference sequential Mapper implementation. We report the performance experiments that demonstrate the efficiency of our method.
△ Less
Submitted 11 May, 2020; v1 submitted 11 December, 2017;
originally announced December 2017.