-
Plexus: Taming Billion-edge Graphs with 3D Parallel GNN Training
Authors:
Aditya K. Ranjan,
Siddharth Singh,
Cunyang Wei,
Abhinav Bhatele
Abstract:
Graph neural networks have emerged as a potent class of neural networks capable of leveraging the connectivity and structure of real-world graphs to learn intricate properties and relationships between nodes. Many real-world graphs exceed the memory capacity of a GPU due to their sheer size, and using GNNs on them requires techniques such as mini-batch sampling to scale. However, this can lead to…
▽ More
Graph neural networks have emerged as a potent class of neural networks capable of leveraging the connectivity and structure of real-world graphs to learn intricate properties and relationships between nodes. Many real-world graphs exceed the memory capacity of a GPU due to their sheer size, and using GNNs on them requires techniques such as mini-batch sampling to scale. However, this can lead to reduced accuracy in some cases, and sampling and data transfer from the CPU to the GPU can also slow down training. On the other hand, distributed full-graph training suffers from high communication overhead and load imbalance due to the irregular structure of graphs. We propose Plexus, a three-dimensional (3D) parallel approach for full-graph training that tackles these issues and scales to billion-edge graphs. Additionally, we introduce optimizations such as a permutation scheme for load balancing, and a performance model to predict the optimal 3D configuration. We evaluate Plexus on several graph datasets and show scaling results for up to 2048 GPUs on Perlmutter, which is 33% of the machine, and 2048 GCDs on Frontier. Plexus achieves unprecedented speedups of 2.3x-12.5x over existing methods and a reduction in the time to solution by 5.2-8.7x on Perlmutter and 7-54.2x on Frontier.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
Optimal Capacity Modification for Stable Matchings with Ties
Authors:
Keshav Ranjan,
Meghana Nasre,
Prajakta Nimbhorkar
Abstract:
We consider the Hospitals/Residents (HR) problem in the presence of ties in preference lists. Among the three notions of stability, viz. weak, strong, and super stability, we focus on the notion of strong stability. Strong stability has many desirable properties, both theoretically and practically; however, its existence is not guaranteed. In this paper, our objective is to optimally increase the…
▽ More
We consider the Hospitals/Residents (HR) problem in the presence of ties in preference lists. Among the three notions of stability, viz. weak, strong, and super stability, we focus on the notion of strong stability. Strong stability has many desirable properties, both theoretically and practically; however, its existence is not guaranteed. In this paper, our objective is to optimally increase the quotas of hospitals to ensure that a strongly stable matching exists in the modified instance. First, we show that if ties are allowed in residents' preference lists, it may not be possible to augment the hospital quotas to obtain an instance that admits a strongly stable matching. When residents' preference lists are strict, we explore two natural optimization criteria: (i) minimizing the total capacity increase across all hospitals (MINSUM) and (ii) minimizing the maximum capacity increase for any hospital (MINMAX). We show that the MINSUM problem admits a polynomial-time algorithm, whereas the MINMAX problem is NP-hard. We prove an analogue of the Rural Hospitals theorem for the MINSUM problem. When each hospital incurs a cost for a unit increase in its quota, the MINSUM problem becomes NP-hard, even for 0/1 costs. In fact, we show that the problem cannot be approximated to any multiplicative factor. We also present a polynomial-time algorithm for optimal MINSUM augmentation when a specified subset of edges is required to be included in the matching. We show that the MINMAX problem is NP-hard in general. When hospital preference lists have ties of length at most $\ell+1$, we give a polynomial-time algorithm that increases each hospital's quota by at most $\ell$. Amongst all instances obtained by at most $\ell$ augmentations per hospital, our algorithm produces a strongly stable matching that is best for residents.
△ Less
Submitted 23 May, 2025; v1 submitted 15 November, 2024;
originally announced November 2024.
-
3D Guidance Law for Flexible Target Enclosing with Inherent Safety
Authors:
Praveen Kumar Ranjan,
Abhinav Sinha,
Yongcan Cao
Abstract:
In this paper, we address the problem of enclosing an arbitrarily moving target in three dimensions by a single pursuer while ensuring the pursuer's safety by preventing collisions with the target. The proposed guidance strategy steers the pursuer to a safe region of space surrounding and excluding the target, allowing it to maintain a certain distance from the latter while offering greater flexib…
▽ More
In this paper, we address the problem of enclosing an arbitrarily moving target in three dimensions by a single pursuer while ensuring the pursuer's safety by preventing collisions with the target. The proposed guidance strategy steers the pursuer to a safe region of space surrounding and excluding the target, allowing it to maintain a certain distance from the latter while offering greater flexibility in positioning and converging to any orbit within this safe zone. We leverage the concept of the Lyapunov Barrier Function as a powerful tool to constrain the distance between the pursuer and the target within asymmetric bounds, thereby ensuring the pursuer's safety within the predefined region. Further, we demonstrate the effectiveness of the proposed guidance law in managing arbitrarily maneuvering targets and other uncertainties (such as vehicle/autopilot dynamics and external disturbances) by enabling the pursuer to consistently achieve stable global enclosing behaviors by switching between stable enclosing trajectories within the safe region whenever necessary, even in response to aggressive target maneuvers. To attest to the merits of our work, we conduct experimental tests with various plant models, including a high-fidelity quadrotor model within Software-in-the-loop (SITL) simulations, encompassing various challenging target maneuver scenarios and requiring only relative information for successful execution.
△ Less
Submitted 17 October, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
Fabricating Paper Circuits with Subtractive Processing
Authors:
Ruhan Yang,
Krithik Ranjan,
Ellen Yi-Luen Do
Abstract:
This paper introduces a new method of paper circuit fabrication that overcomes design barriers and increases flexibility in circuit design. Conventional circuit boards rely on thin traces, which limits the complexity and accuracy when applied to paper circuits. To address this issue, we propose a method that uses large conductive zones in paper circuits and performs subtractive processing during t…
▽ More
This paper introduces a new method of paper circuit fabrication that overcomes design barriers and increases flexibility in circuit design. Conventional circuit boards rely on thin traces, which limits the complexity and accuracy when applied to paper circuits. To address this issue, we propose a method that uses large conductive zones in paper circuits and performs subtractive processing during their fabrication. This approach eliminates design barriers and allows for more flexibility in circuit design. We introduce PaperCAD, a software tool that simplifies the design process by converting traditional circuit design to paper circuit design. We demonstrate our technique by creating two paper circuit boards. Our approach has the potential to promote the development of new applications for paper circuits.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Self-organizing Multiagent Target Enclosing under Limited Information and Safety Guarantees
Authors:
Praveen Kumar Ranjan,
Abhinav Sinha,
Yongcan Cao
Abstract:
This paper introduces an approach to address the target enclosing problem using non-holonomic multiagent systems, where agents self-organize on the enclosing shape around a fixed target. In our approach, agents independently move toward the desired enclosing geometry when apart and activate the collision avoidance mechanism when a collision is imminent, thereby guaranteeing inter-agent safety. Our…
▽ More
This paper introduces an approach to address the target enclosing problem using non-holonomic multiagent systems, where agents self-organize on the enclosing shape around a fixed target. In our approach, agents independently move toward the desired enclosing geometry when apart and activate the collision avoidance mechanism when a collision is imminent, thereby guaranteeing inter-agent safety. Our approach combines global enclosing behavior and local collision avoidance mechanisms by devising a special potential function and sliding manifold. We rigorously show that an agent does not need to ensure safety with every other agent and put forth a concept of the nearest colliding agent (for any arbitrary agent) with whom ensuring safety is sufficient to avoid collisions in the entire swarm. The proposed control eliminates the need for a fixed or pre-established agent arrangement around the target and requires only relative information between an agent and the target. This makes our design particularly appealing for scenarios with limited global information, hence significantly reducing communication requirements. We finally present simulation results to vindicate the efficacy of the proposed method.
△ Less
Submitted 15 August, 2024; v1 submitted 6 April, 2024;
originally announced April 2024.
-
DualStream: Spatially Sharing Selves and Surroundings using Mobile Devices and Augmented Reality
Authors:
Rishi Vanukuru,
Suibi Che-Chuan Weng,
Krithik Ranjan,
Torin Hopkins,
Amy Banic,
Mark D. Gross,
Ellen Yi-Luen Do
Abstract:
In-person human interaction relies on our spatial perception of each other and our surroundings. Current remote communication tools partially address each of these aspects. Video calls convey real user representations but without spatial interactions. Augmented and Virtual Reality (AR/VR) experiences are immersive and spatial but often use virtual environments and characters instead of real-life r…
▽ More
In-person human interaction relies on our spatial perception of each other and our surroundings. Current remote communication tools partially address each of these aspects. Video calls convey real user representations but without spatial interactions. Augmented and Virtual Reality (AR/VR) experiences are immersive and spatial but often use virtual environments and characters instead of real-life representations. Bridging these gaps, we introduce DualStream, a system for synchronous mobile AR remote communication that captures, streams, and displays spatial representations of users and their surroundings. DualStream supports transitions between user and environment representations with different levels of visuospatial fidelity, as well as the creation of persistent shared spaces using environment snapshots. We demonstrate how DualStream can enable spatial communication in real-world contexts, and support the creation of blended spaces for collaboration. A formative evaluation of DualStream revealed that users valued the ability to interact spatially and move between representations, and could see DualStream fitting into their own remote communication practices in the near future. Drawing from these findings, we discuss new opportunities for designing more widely accessible spatial communication tools, centered around the mobile phone.
△ Less
Submitted 2 September, 2023;
originally announced September 2023.
-
Pipit: Scripting the analysis of parallel execution traces
Authors:
Abhinav Bhatele,
Rakrish Dhakal,
Alexander Movsesyan,
Aditya K. Ranjan,
Onur Cankur
Abstract:
Performance analysis is a critical step in the oft-repeated, iterative process of performance tuning of parallel programs. Per-process, per-thread traces (detailed logs of events with timestamps) enable in-depth analysis of parallel program execution to identify different kinds of performance issues. Often times, trace collection tools provide a graphical tool to analyze the trace output. However,…
▽ More
Performance analysis is a critical step in the oft-repeated, iterative process of performance tuning of parallel programs. Per-process, per-thread traces (detailed logs of events with timestamps) enable in-depth analysis of parallel program execution to identify different kinds of performance issues. Often times, trace collection tools provide a graphical tool to analyze the trace output. However, these GUI-based tools only support specific file formats, are challenging to scale to large trace sizes, limit data exploration to the implemented graphical views, and do not support automated comparisons of two or more datasets. In this paper, we present a programmatic approach to analyzing parallel execution traces by leveraging pandas, a powerful Python-based data analysis library. We have developed a Python library, Pipit, on top of pandas that can read traces in different file formats (OTF2, HPCToolkit, Projections, Nsight Systems, etc.) and provides a uniform data structure in the form of a pandas DataFrame. Pipit provides operations to aggregate, filter, and transform the events in a trace to present the data in different ways. We also provide several functions to quickly and easily identify performance issues in parallel executions. More importantly, the API is easily extensible to support custom analyses by different end users.
△ Less
Submitted 14 May, 2024; v1 submitted 19 June, 2023;
originally announced June 2023.
-
A 4D Hybrid Algorithm to Scale Parallel Training to Thousands of GPUs
Authors:
Siddharth Singh,
Prajwal Singhania,
Aditya K. Ranjan,
Zack Sating,
Abhinav Bhatele
Abstract:
Heavy communication, in particular, collective operations, can become a critical performance bottleneck in scaling the training of billion-parameter neural networks to large-scale parallel systems. This paper introduces a four-dimensional (4D) approach to optimize communication in parallel training. This 4D approach is a hybrid of 3D tensor and data parallelism, and is implemented in the AxoNN fra…
▽ More
Heavy communication, in particular, collective operations, can become a critical performance bottleneck in scaling the training of billion-parameter neural networks to large-scale parallel systems. This paper introduces a four-dimensional (4D) approach to optimize communication in parallel training. This 4D approach is a hybrid of 3D tensor and data parallelism, and is implemented in the AxoNN framework. In addition, we employ two key strategies to further minimize communication overheads. First, we aggressively overlap expensive collective operations (reduce-scatter, all-gather, and all-reduce) with computation. Second, we develop an analytical model to identify high-performing configurations within the large search space defined by our 4D algorithm. This model empowers practitioners by simplifying the tuning process for their specific training workloads. When training an 80-billion parameter GPT on 1024 GPUs of Perlmutter, AxoNN surpasses Megatron-LM, a state-of-the-art framework, by a significant 26%. Additionally, it achieves a significantly high 57% of the theoretical peak FLOP/s or 182 PFLOP/s in total.
△ Less
Submitted 14 May, 2024; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Critical Relaxed Stable Matchings with Ties in the Many-to-Many Setting
Authors:
Meghana Nasre,
Prajakta Nimbhorkar,
Keshav Ranjan
Abstract:
We study the many-to-many bipartite matching problem in the presence of preferences where ties, as well as lower quotas, may appear on both sides of the bipartition. The input is a bipartite graph $G=(A \cup B, E)$, where each vertex in $A \cup B$ has a positive upper quota and a non-negative lower quota denoting the maximum and minimum number of vertices that can be assigned to it from its neighb…
▽ More
We study the many-to-many bipartite matching problem in the presence of preferences where ties, as well as lower quotas, may appear on both sides of the bipartition. The input is a bipartite graph $G=(A \cup B, E)$, where each vertex in $A \cup B$ has a positive upper quota and a non-negative lower quota denoting the maximum and minimum number of vertices that can be assigned to it from its neighborhood. Additionally, each vertex specifies a preference ordering, possibly containing ties, over its neighbors. A \textit{critical} matching is a matching which fulfills vertex lower quotas to the maximum possible extent. We seek to compute a matching that is critical as well as optimal with respect to the preferences of vertices. Stability, a well-accepted notion of optimality in the presence of two-sided preferences, is generalized to weak-stability in the presence of ties. However, a matching that is critical as well as weakly stable may not exist. Popularity is another well-investigated notion of optimality for the two-sided preference model; however, in the presence of ties (even without lower quotas), a popular matching may not exist. We, therefore, consider the notion of relaxed stability, which was introduced and studied by Krishnaa, Limaye, Nasre, and Nimbhorkar~(JoCO 2023). We show that a critical matching that is relaxed stable always exists, although computing a maximum-size relaxed stable matching turns out to be NP-hard. Our main contribution is a $\frac{3}{2}$-approximation algorithm for computing a maximum-size critical relaxed stable matching.
△ Less
Submitted 12 April, 2025; v1 submitted 22 March, 2023;
originally announced March 2023.
-
Popular Critical Matchings in the Many-to-Many Setting
Authors:
Meghana Nasre,
Prajakta Nimbhorkar,
Keshav Ranjan,
Ankita Sarkar
Abstract:
We consider the many-to-many bipartite matching problem in the presence of two-sided preferences and two-sided lower quotas. The input to our problem is a bipartite graph G=(A U B, E), where each vertex in A U B specifies a strict preference ordering over its neighbors. Each vertex has an upper quota and a lower quota denoting the maximum and minimum number of vertices that can be assigned to it f…
▽ More
We consider the many-to-many bipartite matching problem in the presence of two-sided preferences and two-sided lower quotas. The input to our problem is a bipartite graph G=(A U B, E), where each vertex in A U B specifies a strict preference ordering over its neighbors. Each vertex has an upper quota and a lower quota denoting the maximum and minimum number of vertices that can be assigned to it from its neighborhood. In the many-to-many setting with two-sided lower quotas, informally, a critical matching is a matching which fulfils vertex lower quotas to the maximum possible extent. This is a natural generalization of the definition of critical matching in the one-to-one setting [Kavitha T., FSTTCS 2021]. Our goal in the given problem is to find a popular matching in the set of critical matchings. A matching is popular in a given set of matchings if it remains undefeated in a head-to-head election with any matching in that set. Here, vertices cast votes between pairs of matchings. We show that there always exists a matching that is popular in the set of critical matchings. We present an efficient algorithm to compute such a matching of the largest size. We prove the popularity of our matching using a dual certificate.
△ Less
Submitted 19 March, 2023; v1 submitted 24 June, 2022;
originally announced June 2022.
-
On vertex-edge and independent vertex-edge domination
Authors:
Subhabrata Paul,
Keshav Ranjan
Abstract:
Given a graph $G = (V,E)$, a vertex $u \in V$ ve-dominates all edges incident to any vertex of $N_G[u]$. A set $S \subseteq V$ is a ve-dominating set if for all edges $e\in E$, there exists a vertex $u \in S$ such that $u$ ve-dominates $e$. Lewis [Ph.D. thesis, 2007] proposed a linear time algorithm for ve-domination problem for trees. In this paper, first we have constructed an example where the…
▽ More
Given a graph $G = (V,E)$, a vertex $u \in V$ ve-dominates all edges incident to any vertex of $N_G[u]$. A set $S \subseteq V$ is a ve-dominating set if for all edges $e\in E$, there exists a vertex $u \in S$ such that $u$ ve-dominates $e$. Lewis [Ph.D. thesis, 2007] proposed a linear time algorithm for ve-domination problem for trees. In this paper, first we have constructed an example where the proposed algorithm fails. Then we have proposed a linear time algorithm for ve-domination problem in block graphs, which is a superclass of trees. We have also proved that finding minimum ve-dominating set is NP-complete for undirected path graphs. Finally, we have characterized the trees with equal ve-domination and independent ve-domination number.
△ Less
Submitted 8 October, 2019;
originally announced October 2019.