-
GroundHog: Revolutionizing GLDAS Groundwater Storage Downscaling for Enhanced Recharge Estimation in Bangladesh
Authors:
Saleh Sakib Ahmed,
Rashed Uz Zzaman,
Saifur Rahman Jony,
Faizur Rahman Himel,
Afroza Sharmin,
A. H. M. Khalequr Rahman,
M. Sohel Rahman,
Sara Nowreen
Abstract:
Long-term groundwater level (GWL) measurement is vital for effective policymaking and recharge estimation using annual maxima and minima. However, current methods prioritize short-term predictions and lack multi-year applicability, limiting their utility. Moreover, sparse in-situ measurements lead to reliance on low-resolution satellite data like GLDAS as the ground truth for Machine Learning mode…
▽ More
Long-term groundwater level (GWL) measurement is vital for effective policymaking and recharge estimation using annual maxima and minima. However, current methods prioritize short-term predictions and lack multi-year applicability, limiting their utility. Moreover, sparse in-situ measurements lead to reliance on low-resolution satellite data like GLDAS as the ground truth for Machine Learning models, further constraining accuracy. To overcome these challenges, we first develop an ML model to mitigate data gaps, achieving $R^2$ scores of 0.855 and 0.963 for maximum and minimum GWL predictions, respectively. Subsequently, using these predictions and well observations as ground truth, we train an Upsampling Model that uses low-resolution (25 km) GLDAS data as input to produce high-resolution (2 km) GWLs, achieving an excellent $R^2$ score of 0.96. Our approach successfully upscales GLDAS data for 2003-2024, allowing high-resolution recharge estimations and revealing critical trends for proactive resource management. Our method allows upsampling of groundwater storage (GWS) from GLDAS to high-resolution GWLs for any points independently of officially curated piezometer data, making it a valuable tool for decision-making.
△ Less
Submitted 28 March, 2025;
originally announced March 2025.
-
A Scalable Method for Readable Tree Layouts
Authors:
Kathryn Gray,
Mingwei Li,
Reyan Ahmed,
Md. Khaledur Rahman,
Ariful Azad,
Stephen Kobourov,
Katy Börner
Abstract:
Large tree structures are ubiquitous and real-world relational datasets often have information associated with nodes (e.g., labels or other attributes) and edges (e.g., weights or distances) that need to be communicated to the viewers. Yet, scalable, easy to read tree layouts are difficult to achieve. We consider tree layouts to be readable if they meet some basic requirements: node labels should…
▽ More
Large tree structures are ubiquitous and real-world relational datasets often have information associated with nodes (e.g., labels or other attributes) and edges (e.g., weights or distances) that need to be communicated to the viewers. Yet, scalable, easy to read tree layouts are difficult to achieve. We consider tree layouts to be readable if they meet some basic requirements: node labels should not overlap, edges should not cross, edge lengths should be preserved, and the output should be compact. There are many algorithms for drawing trees, although very few take node labels or edge lengths into account, and none optimizes all requirements above. With this in mind, we propose a new scalable method for readable tree layouts. The algorithm guarantees that the layout has no edge crossings and no label overlaps, and optimizes one of the remaining aspects: desired edge lengths and compactness. We evaluate the performance of the new algorithm by comparison with related earlier approaches using several real-world datasets, ranging from a few thousand nodes to hundreds of thousands of nodes. Tree layout algorithms can be used to visualize large general graphs, by extracting a hierarchy of progressively larger trees. We illustrate this functionality by presenting several map-like visualizations generated by the new tree layout algorithm.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.
-
Triple Sparsification of Graph Convolutional Networks without Sacrificing the Accuracy
Authors:
Md. Khaledur Rahman,
Ariful Azad
Abstract:
Graph Neural Networks (GNNs) are widely used to perform different machine learning tasks on graphs. As the size of the graphs grows, and the GNNs get deeper, training and inference time become costly in addition to the memory requirement. Thus, without sacrificing accuracy, graph sparsification, or model compression becomes a viable approach for graph learning tasks. A few existing techniques only…
▽ More
Graph Neural Networks (GNNs) are widely used to perform different machine learning tasks on graphs. As the size of the graphs grows, and the GNNs get deeper, training and inference time become costly in addition to the memory requirement. Thus, without sacrificing accuracy, graph sparsification, or model compression becomes a viable approach for graph learning tasks. A few existing techniques only study the sparsification of graphs and GNN models. In this paper, we develop a SparseGCN pipeline to study all possible sparsification in GNN. We provide a theoretical analysis and empirically show that it can add up to 11.6\% additional sparsity to the embedding matrix without sacrificing the accuracy of the commonly used benchmark graph datasets.
△ Less
Submitted 6 August, 2022;
originally announced August 2022.
-
MarkovGNN: Graph Neural Networks on Markov Diffusion
Authors:
Md. Khaledur Rahman,
Abhigya Agrawal,
Ariful Azad
Abstract:
Most real-world networks contain well-defined community structures where nodes are densely connected internally within communities. To learn from these networks, we develop MarkovGNN that captures the formation and evolution of communities directly in different convolutional layers. Unlike most Graph Neural Networks (GNNs) that consider a static graph at every layer, MarkovGNN generates different…
▽ More
Most real-world networks contain well-defined community structures where nodes are densely connected internally within communities. To learn from these networks, we develop MarkovGNN that captures the formation and evolution of communities directly in different convolutional layers. Unlike most Graph Neural Networks (GNNs) that consider a static graph at every layer, MarkovGNN generates different stochastic matrices using a Markov process and then uses these community-capturing matrices in different layers. MarkovGNN is a general approach that could be used with most existing GNNs. We experimentally show that MarkovGNN outperforms other GNNs for clustering, node classification, and visualization tasks. The source code of MarkovGNN is publicly available at \url{https://github.com/HipGraph/MarkovGNN}.
△ Less
Submitted 29 April, 2022; v1 submitted 4 February, 2022;
originally announced February 2022.
-
A Comprehensive Analytical Survey on Unsupervised and Semi-Supervised Graph Representation Learning Methods
Authors:
Md. Khaledur Rahman,
Ariful Azad
Abstract:
Graph representation learning is a fast-growing field where one of the main objectives is to generate meaningful representations of graphs in lower-dimensional spaces. The learned embeddings have been successfully applied to perform various prediction tasks, such as link prediction, node classification, clustering, and visualization. The collective effort of the graph learning community has delive…
▽ More
Graph representation learning is a fast-growing field where one of the main objectives is to generate meaningful representations of graphs in lower-dimensional spaces. The learned embeddings have been successfully applied to perform various prediction tasks, such as link prediction, node classification, clustering, and visualization. The collective effort of the graph learning community has delivered hundreds of methods, but no single method excels under all evaluation metrics such as prediction accuracy, running time, scalability, etc. This survey aims to evaluate all major classes of graph embedding methods by considering algorithmic variations, parameter selections, scalability, hardware and software platforms, downstream ML tasks, and diverse datasets. We organized graph embedding techniques using a taxonomy that includes methods from manual feature engineering, matrix factorization, shallow neural networks, and deep graph convolutional networks. We evaluated these classes of algorithms for node classification, link prediction, clustering, and visualization tasks using widely used benchmark graphs. We designed our experiments on top of PyTorch Geometric and DGL libraries and run experiments on different multicore CPU and GPU platforms. We rigorously scrutinize the performance of embedding methods under various performance metrics and summarize the results. Thus, this paper may serve as a comparative guide to help users select methods that are most suitable for their tasks.
△ Less
Submitted 20 December, 2021;
originally announced December 2021.
-
An Analytical Survey on Recent Trends in High Dimensional Data Visualization
Authors:
Alexander Kiefer,
Md. Khaledur Rahman
Abstract:
Data visualization is the process by which data of any size or dimensionality is processed to produce an understandable set of data in a lower dimensionality, allowing it to be manipulated and understood more easily by people. The goal of our paper is to survey the performance of current high-dimensional data visualization techniques and quantify their strengths and weaknesses through relevant qua…
▽ More
Data visualization is the process by which data of any size or dimensionality is processed to produce an understandable set of data in a lower dimensionality, allowing it to be manipulated and understood more easily by people. The goal of our paper is to survey the performance of current high-dimensional data visualization techniques and quantify their strengths and weaknesses through relevant quantitative measures, including runtime, memory usage, clustering quality, separation quality, global structure preservation, and local structure preservation. To perform the analysis, we select a subset of state-of-the-art methods. Our work shows how the selected algorithms produce embeddings with unique qualities that lend themselves towards certain tasks, and how each of these algorithms are constrained by compute resources.
△ Less
Submitted 5 July, 2021;
originally announced July 2021.
-
FusedMM: A Unified SDDMM-SpMM Kernel for Graph Embedding and Graph Neural Networks
Authors:
Md. Khaledur Rahman,
Majedul Haque Sujon,
Ariful Azad
Abstract:
We develop a fused matrix multiplication kernel that unifies sampled dense-dense matrix multiplication and sparse-dense matrix multiplication under a single operation called FusedMM. By using user-defined functions, FusedMM can capture almost all computational patterns needed by popular graph embedding and GNN approaches. FusedMM is an order of magnitude faster than its equivalent kernels in Deep…
▽ More
We develop a fused matrix multiplication kernel that unifies sampled dense-dense matrix multiplication and sparse-dense matrix multiplication under a single operation called FusedMM. By using user-defined functions, FusedMM can capture almost all computational patterns needed by popular graph embedding and GNN approaches. FusedMM is an order of magnitude faster than its equivalent kernels in Deep Graph Library. The superior performance of FusedMM comes from the low-level vectorized kernels, a suitable load balancing scheme and an efficient utilization of the memory bandwidth. FusedMM can tune its performance using a code generator and perform equally well on Intel, AMD and ARM processors. FusedMM speeds up an end-to-end graph embedding algorithm by up to 28x on different processors.
△ Less
Submitted 26 October, 2021; v1 submitted 7 November, 2020;
originally announced November 2020.
-
Force2Vec: Parallel force-directed graph embedding
Authors:
Md. Khaledur Rahman,
Majedul Haque Sujon,
Ariful Azad
Abstract:
A graph embedding algorithm embeds a graph into a low-dimensional space such that the embedding preserves the inherent properties of the graph. While graph embedding is fundamentally related to graph visualization, prior work did not exploit this connection explicitly. We develop Force2Vec that uses force-directed graph layout models in a graph embedding setting with an aim to excel in both machin…
▽ More
A graph embedding algorithm embeds a graph into a low-dimensional space such that the embedding preserves the inherent properties of the graph. While graph embedding is fundamentally related to graph visualization, prior work did not exploit this connection explicitly. We develop Force2Vec that uses force-directed graph layout models in a graph embedding setting with an aim to excel in both machine learning (ML) and visualization tasks. We make Force2Vec highly parallel by mapping its core computations to linear algebra and utilizing multiple levels of parallelism available in modern processors. The resultant algorithm is an order of magnitude faster than existing methods (43x faster than DeepWalk, on average) and can generate embeddings from graphs with billions of edges in a few hours. In comparison to existing methods, Force2Vec is better in graph visualization and performs comparably or better in ML tasks such as link prediction, node classification, and clustering. Source code is available at https://github.com/HipGraph/Force2Vec.
△ Less
Submitted 16 September, 2020;
originally announced September 2020.
-
Training Sensitivity in Graph Isomorphism Network
Authors:
Md. Khaledur Rahman
Abstract:
Graph neural network (GNN) is a popular tool to learn the lower-dimensional representation of a graph. It facilitates the applicability of machine learning tasks on graphs by incorporating domain-specific features. There are various options for underlying procedures (such as optimization functions, activation functions, etc.) that can be considered in the implementation of GNN. However, most of th…
▽ More
Graph neural network (GNN) is a popular tool to learn the lower-dimensional representation of a graph. It facilitates the applicability of machine learning tasks on graphs by incorporating domain-specific features. There are various options for underlying procedures (such as optimization functions, activation functions, etc.) that can be considered in the implementation of GNN. However, most of the existing tools are confined to one approach without any analysis. Thus, this emerging field lacks a robust implementation ignoring the highly irregular structure of the real-world graphs. In this paper, we attempt to fill this gap by studying various alternative functions for a respective module using a diverse set of benchmark datasets. Our empirical results suggest that the generally used underlying techniques do not always perform well to capture the overall structure from a set of graphs.
△ Less
Submitted 18 August, 2020;
originally announced August 2020.
-
BatchLayout: A Batch-Parallel Force-Directed Graph Layout Algorithm in Shared Memory
Authors:
Md. Khaledur Rahman,
Majedul Haque Sujon,
Ariful Azad
Abstract:
Force-directed algorithms are widely used to generate aesthetically pleasing layouts of graphs or networks arisen in many scientific disciplines. To visualize large-scale graphs, several parallel algorithms have been discussed in the literature. However, existing parallel algorithms do not utilize memory hierarchy efficiently and often offer limited parallelism. This paper addresses these limitati…
▽ More
Force-directed algorithms are widely used to generate aesthetically pleasing layouts of graphs or networks arisen in many scientific disciplines. To visualize large-scale graphs, several parallel algorithms have been discussed in the literature. However, existing parallel algorithms do not utilize memory hierarchy efficiently and often offer limited parallelism. This paper addresses these limitations with BatchLayout, an algorithm that groups vertices into minibatches and processes them in parallel. BatchLayout also employs cache blocking techniques to utilize memory hierarchy efficiently. More parallelism and improved memory accesses coupled with force approximating techniques, better initialization, and optimized learning rate make BatchLayout significantly faster than other state-of-the-art algorithms such as ForceAtlas2 and OpenOrd. The visualization quality of layouts from BatchLayout is comparable or better than similar visualization tools. All of our source code, links to datasets, results and log files are available at https://github.com/khaled-rahman/BatchLayout.
△ Less
Submitted 11 February, 2020;
originally announced February 2020.
-
Prefix Block-Interchanges on Binary and Ternary Strings
Authors:
Md. Khaledur Rahman,
M. Sohel Rahman
Abstract:
The genome rearrangement problem computes the minimum number of operations that are required to sort all elements of a permutation. A block-interchange operation exchanges two blocks of a permutation which are not necessarily adjacent and in a prefix block-interchange, one block is always the prefix of that permutation. In this paper, we focus on applying prefix block-interchanges on binary and te…
▽ More
The genome rearrangement problem computes the minimum number of operations that are required to sort all elements of a permutation. A block-interchange operation exchanges two blocks of a permutation which are not necessarily adjacent and in a prefix block-interchange, one block is always the prefix of that permutation. In this paper, we focus on applying prefix block-interchanges on binary and ternary strings. We present upper bounds to group and sort a given binary/ternary string. We also provide upper bounds for a different version of the block-interchange operation which we refer to as the `restricted prefix block-interchange'. We observe that our obtained upper bound for restricted prefix block-interchange operations on binary strings is better than that of other genome rearrangement operations to group fully normalized binary strings. Consequently, we provide a linear-time algorithm to solve the problem of grouping binary normalized strings by restricted prefix block-interchanges. We also provide a polynomial time algorithm to group normalized ternary strings by prefix block-interchange operations. Finally, we provide a classification for ternary strings based on the required number of prefix block-interchange operations.
△ Less
Submitted 19 May, 2019;
originally announced June 2019.
-
NEDindex: A new metric for community structure in networks
Authors:
Md. Khaledur Rahman
Abstract:
There are several metrics (Modularity, Mutual Information, Conductance, etc.) to evaluate the strength of graph clustering in large graphs. These metrics have great significance to measure the effectiveness and they are often used to find the strongly connected clusters with respect to the whole graph. In this paper, we propose a new metric to evaluate the strength of graph clustering and also stu…
▽ More
There are several metrics (Modularity, Mutual Information, Conductance, etc.) to evaluate the strength of graph clustering in large graphs. These metrics have great significance to measure the effectiveness and they are often used to find the strongly connected clusters with respect to the whole graph. In this paper, we propose a new metric to evaluate the strength of graph clustering and also study its applications. We show that our proposed metric has great consistency which is similar to other metrics and easy to calculate. Our proposed metric also shows consistency where other metrics fail in some special cases. We demonstrate that our metric has reasonable strength while extracting strongly connected communities in both simulated (in silico) data and real data networks. We also show some comparative results of our proposed metric with other popular metric(s) for Online Social Networks (OSN) and Gene Regulatory Networks (GRN).
△ Less
Submitted 25 January, 2016;
originally announced October 2016.