-
Relational Deep Learning: Challenges, Foundations and Next-Generation Architectures
Authors:
Vijay Prakash Dwivedi,
Charilaos Kanatsoulis,
Shenyang Huang,
Jure Leskovec
Abstract:
Graph machine learning has led to a significant increase in the capabilities of models that learn on arbitrary graph-structured data and has been applied to molecules, social networks, recommendation systems, and transportation, among other domains. Data in multi-tabular relational databases can also be constructed as 'relational entity graphs' for Relational Deep Learning (RDL) - a new blueprint…
▽ More
Graph machine learning has led to a significant increase in the capabilities of models that learn on arbitrary graph-structured data and has been applied to molecules, social networks, recommendation systems, and transportation, among other domains. Data in multi-tabular relational databases can also be constructed as 'relational entity graphs' for Relational Deep Learning (RDL) - a new blueprint that enables end-to-end representation learning without traditional feature engineering. Compared to arbitrary graph-structured data, relational entity graphs have key properties: (i) their structure is defined by primary-foreign key relationships between entities in different tables, (ii) the structural connectivity is a function of the relational schema defining a database, and (iii) the graph connectivity is temporal and heterogeneous in nature. In this paper, we provide a comprehensive review of RDL by first introducing the representation of relational databases as relational entity graphs, and then reviewing public benchmark datasets that have been used to develop and evaluate recent GNN-based RDL models. We discuss key challenges including large-scale multi-table integration and the complexities of modeling temporal dynamics and heterogeneous data, while also surveying foundational neural network methods and recent architectural advances specialized for relational entity graphs. Finally, we explore opportunities to unify these distinct modeling challenges, highlighting how RDL converges multiple sub-fields in graph machine learning towards the design of foundation models that can transform the processing of relational data.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
Large Language Models are Good Relational Learners
Authors:
Fang Wu,
Vijay Prakash Dwivedi,
Jure Leskovec
Abstract:
Large language models (LLMs) have demonstrated remarkable capabilities across various domains, yet their application to relational deep learning (RDL) remains underexplored. Existing approaches adapt LLMs by traversing relational links between entities in a database and converting the structured data into flat text documents. Still, this text-based serialization disregards critical relational stru…
▽ More
Large language models (LLMs) have demonstrated remarkable capabilities across various domains, yet their application to relational deep learning (RDL) remains underexplored. Existing approaches adapt LLMs by traversing relational links between entities in a database and converting the structured data into flat text documents. Still, this text-based serialization disregards critical relational structures, introduces redundancy, and often exceeds standard LLM context lengths. We introduce Rel-LLM, a novel architecture that utilizes a graph neural network (GNN)- based encoder to generate structured relational prompts for LLMs within a retrieval-augmented generation (RAG) framework. Unlike traditional text-based serialization approaches, our method preserves the inherent relational structure of databases while enabling LLMs to effectively process and reason over complex entity relationships. Specifically, the GNN encoder extracts a local subgraph around an entity to build feature representations that contain relevant entity relationships and temporal dependencies. These representations are transformed into structured prompts using a denormalization process, effectively allowing the LLM to reason over relational structures. Through extensive experiments, we demonstrate that Rel-LLM outperforms existing methods on key RDL tasks, offering a scalable and efficient approach to integrating LLMs with structured data sources. Code is available at https://github.com/smiles724/Rel-LLM.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
Relational Graph Transformer
Authors:
Vijay Prakash Dwivedi,
Sri Jaladi,
Yangyi Shen,
Federico López,
Charilaos I. Kanatsoulis,
Rishi Puri,
Matthias Fey,
Jure Leskovec
Abstract:
Relational Deep Learning (RDL) is a promising approach for building state-of-the-art predictive models on multi-table relational data by representing it as a heterogeneous temporal graph. However, commonly used Graph Neural Network models suffer from fundamental limitations in capturing complex structural patterns and long-range dependencies that are inherent in relational data. While Graph Transf…
▽ More
Relational Deep Learning (RDL) is a promising approach for building state-of-the-art predictive models on multi-table relational data by representing it as a heterogeneous temporal graph. However, commonly used Graph Neural Network models suffer from fundamental limitations in capturing complex structural patterns and long-range dependencies that are inherent in relational data. While Graph Transformers have emerged as powerful alternatives to GNNs on general graphs, applying them to relational entity graphs presents unique challenges: (i) Traditional positional encodings fail to generalize to massive, heterogeneous graphs; (ii) existing architectures cannot model the temporal dynamics and schema constraints of relational data; (iii) existing tokenization schemes lose critical structural information. Here we introduce the Relational Graph Transformer (RelGT), the first graph transformer architecture designed specifically for relational tables. RelGT employs a novel multi-element tokenization strategy that decomposes each node into five components (features, type, hop distance, time, and local structure), enabling efficient encoding of heterogeneity, temporality, and topology without expensive precomputation. Our architecture combines local attention over sampled subgraphs with global attention to learnable centroids, incorporating both local and database-wide representations. Across 21 tasks from the RelBench benchmark, RelGT consistently matches or outperforms GNN baselines by up to 18%, establishing Graph Transformers as a powerful architecture for Relational Deep Learning.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
Curriculum Learning-Driven PIELMs for Fluid Flow Simulations
Authors:
Vikas Dwivedi,
Bruno Sixou,
Monica Sigovan
Abstract:
This paper presents two novel, physics-informed extreme learning machine (PIELM)-based algorithms for solving steady and unsteady nonlinear partial differential equations (PDEs) related to fluid flow. Although single-hidden-layer PIELMs outperform deep physics-informed neural networks (PINNs) in speed and accuracy for linear and quasilinear PDEs, their extension to nonlinear problems remains chall…
▽ More
This paper presents two novel, physics-informed extreme learning machine (PIELM)-based algorithms for solving steady and unsteady nonlinear partial differential equations (PDEs) related to fluid flow. Although single-hidden-layer PIELMs outperform deep physics-informed neural networks (PINNs) in speed and accuracy for linear and quasilinear PDEs, their extension to nonlinear problems remains challenging. To address this, we introduce a curriculum learning strategy that reformulates nonlinear PDEs as a sequence of increasingly complex quasilinear PDEs. Additionally, our approach enables a physically interpretable initialization of network parameters by leveraging Radial Basis Functions (RBFs). The performance of the proposed algorithms is validated on two benchmark incompressible flow problems: the viscous Burgers equation and lid-driven cavity flow. To the best of our knowledge, this is the first work to extend PIELM to solving Burgers' shock solution as well as lid-driven cavity flow up to a Reynolds number of 100. As a practical application, we employ PIELM to predict blood flow in a stenotic vessel. The results confirm that PIELM efficiently handles nonlinear PDEs, positioning it as a promising alternative to PINNs for both linear and nonlinear PDEs.
△ Less
Submitted 8 March, 2025;
originally announced March 2025.
-
Representation Learning of Structured Data for Medical Foundation Models
Authors:
Vijay Prakash Dwivedi,
Viktor Schlegel,
Andy T. Liu,
Thanh-Tung Nguyen,
Abhinav Ramesh Kashyap,
Jeng Wei,
Wei-Hsian Yin,
Stefan Winkler,
Robby T. Tan
Abstract:
Large Language Models (LLMs) have demonstrated remarkable performance across various domains, including healthcare. However, their ability to effectively represent structured non-textual data, such as the alphanumeric medical codes used in records like ICD-10 or SNOMED-CT, is limited and has been particularly exposed in recent research. This paper examines the challenges LLMs face in processing me…
▽ More
Large Language Models (LLMs) have demonstrated remarkable performance across various domains, including healthcare. However, their ability to effectively represent structured non-textual data, such as the alphanumeric medical codes used in records like ICD-10 or SNOMED-CT, is limited and has been particularly exposed in recent research. This paper examines the challenges LLMs face in processing medical codes due to the shortcomings of current tokenization methods. As a result, we introduce the UniStruct architecture to design a multimodal medical foundation model of unstructured text and structured data, which addresses these challenges by adapting subword tokenization techniques specifically for the structured medical codes. Our approach is validated through model pre-training on both an extensive internal medical database and a public repository of structured medical records. Trained on over 1 billion tokens on the internal medical database, the proposed model achieves up to a 23% improvement in evaluation metrics, with around 2% gain attributed to our proposed tokenization. Additionally, when evaluated on the EHRSHOT public benchmark with a 1/1000 fraction of the pre-training data, the UniStruct model improves performance on over 42% of the downstream tasks. Our approach not only enhances the representation and generalization capabilities of patient-centric models but also bridges a critical gap in representation learning models' ability to handle complex structured medical data, alongside unstructured text.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues
Authors:
Kuluhan Binici,
Abhinav Ramesh Kashyap,
Viktor Schlegel,
Andy T. Liu,
Vijay Prakash Dwivedi,
Thanh-Tung Nguyen,
Xiaoxue Gao,
Nancy F. Chen,
Stefan Winkler
Abstract:
Automatic Speech Recognition (ASR) systems are pivotal in transcribing speech into text, yet the errors they introduce can significantly degrade the performance of downstream tasks like summarization. This issue is particularly pronounced in clinical dialogue summarization, a low-resource domain where supervised data for fine-tuning is scarce, necessitating the use of ASR models as black-box solut…
▽ More
Automatic Speech Recognition (ASR) systems are pivotal in transcribing speech into text, yet the errors they introduce can significantly degrade the performance of downstream tasks like summarization. This issue is particularly pronounced in clinical dialogue summarization, a low-resource domain where supervised data for fine-tuning is scarce, necessitating the use of ASR models as black-box solutions. Employing conventional data augmentation for enhancing the noise robustness of summarization models is not feasible either due to the unavailability of sufficient medical dialogue audio recordings and corresponding ASR transcripts. To address this challenge, we propose MEDSAGE, an approach for generating synthetic samples for data augmentation using Large Language Models (LLMs). Specifically, we leverage the in-context learning capabilities of LLMs and instruct them to generate ASR-like errors based on a few available medical dialogue examples with audio recordings. Experimental results show that LLMs can effectively model ASR noise, and incorporating this noisy data into the training process significantly improves the robustness and accuracy of medical dialogue summarization systems. This approach addresses the challenges of noisy ASR outputs in critical applications, offering a robust solution to enhance the reliability of clinical dialogue summarization.
△ Less
Submitted 8 January, 2025; v1 submitted 26 August, 2024;
originally announced August 2024.
-
uMedSum: A Unified Framework for Advancing Medical Abstractive Summarization
Authors:
Aishik Nagar,
Yutong Liu,
Andy T. Liu,
Viktor Schlegel,
Vijay Prakash Dwivedi,
Arun-Kumar Kaliya-Perumal,
Guna Pratheep Kalanchiam,
Yili Tang,
Robby T. Tan
Abstract:
Medical abstractive summarization faces the challenge of balancing faithfulness and informativeness. Current methods often sacrifice key information for faithfulness or introduce confabulations when prioritizing informativeness. While recent advancements in techniques like in-context learning (ICL) and fine-tuning have improved medical summarization, they often overlook crucial aspects such as fai…
▽ More
Medical abstractive summarization faces the challenge of balancing faithfulness and informativeness. Current methods often sacrifice key information for faithfulness or introduce confabulations when prioritizing informativeness. While recent advancements in techniques like in-context learning (ICL) and fine-tuning have improved medical summarization, they often overlook crucial aspects such as faithfulness and informativeness without considering advanced methods like model reasoning and self-improvement. Moreover, the field lacks a unified benchmark, hindering systematic evaluation due to varied metrics and datasets. This paper addresses these gaps by presenting a comprehensive benchmark of six advanced abstractive summarization methods across three diverse datasets using five standardized metrics. Building on these findings, we propose uMedSum, a modular hybrid summarization framework that introduces novel approaches for sequential confabulation removal followed by key missing information addition, ensuring both faithfulness and informativeness. Our work improves upon previous GPT-4-based state-of-the-art (SOTA) medical summarization methods, significantly outperforming them in both quantitative metrics and qualitative domain expert evaluations. Notably, we achieve an average relative performance improvement of 11.8% in reference-free metrics over the previous SOTA. Doctors prefer uMedSum's summaries 6 times more than previous SOTA in difficult cases where there are chances of confabulations or missing information. These results highlight uMedSum's effectiveness and generalizability across various datasets and metrics, marking a significant advancement in medical summarization.
△ Less
Submitted 25 August, 2024; v1 submitted 21 August, 2024;
originally announced August 2024.
-
M-QALM: A Benchmark to Assess Clinical Reading Comprehension and Knowledge Recall in Large Language Models via Question Answering
Authors:
Anand Subramanian,
Viktor Schlegel,
Abhinav Ramesh Kashyap,
Thanh-Tung Nguyen,
Vijay Prakash Dwivedi,
Stefan Winkler
Abstract:
There is vivid research on adapting Large Language Models (LLMs) to perform a variety of tasks in high-stakes domains such as healthcare. Despite their popularity, there is a lack of understanding of the extent and contributing factors that allow LLMs to recall relevant knowledge and combine it with presented information in the clinical and biomedical domain: a fundamental pre-requisite for succes…
▽ More
There is vivid research on adapting Large Language Models (LLMs) to perform a variety of tasks in high-stakes domains such as healthcare. Despite their popularity, there is a lack of understanding of the extent and contributing factors that allow LLMs to recall relevant knowledge and combine it with presented information in the clinical and biomedical domain: a fundamental pre-requisite for success on down-stream tasks. Addressing this gap, we use Multiple Choice and Abstractive Question Answering to conduct a large-scale empirical study on 22 datasets in three generalist and three specialist biomedical sub-domains. Our multifaceted analysis of the performance of 15 LLMs, further broken down by sub-domain, source of knowledge and model architecture, uncovers success factors such as instruction tuning that lead to improved recall and comprehension. We further show that while recently proposed domain-adapted models may lack adequate knowledge, directly fine-tuning on our collected medical knowledge datasets shows encouraging results, even generalising to unseen specialist sub-domains. We complement the quantitative results with a skill-oriented manual error analysis, which reveals a significant gap between the models' capabilities to simply recall necessary knowledge and to integrate it with the presented context. To foster research and collaboration in this field we share M-QALM, our resources, standardised methodology, and evaluation results, with the research community to facilitate further advancements in clinical knowledge representation learning within language models.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Global versus Local: Evaluating AlexNet Architectures for Tropical Cyclone Intensity Estimation
Authors:
Vikas Dwivedi
Abstract:
Given the destructive impacts of tropical cyclones, it is critical to have a reliable system for cyclone intensity detection. Various techniques are available for this purpose, each with differing levels of accuracy. In this paper, we introduce two ensemble-based models based on AlexNet architecture to estimate tropical cyclone intensity using visible satellite images. The first model, trained on…
▽ More
Given the destructive impacts of tropical cyclones, it is critical to have a reliable system for cyclone intensity detection. Various techniques are available for this purpose, each with differing levels of accuracy. In this paper, we introduce two ensemble-based models based on AlexNet architecture to estimate tropical cyclone intensity using visible satellite images. The first model, trained on the entire dataset, is called the global AlexNet model. The second model is a distributed version of AlexNet in which multiple AlexNets are trained separately on subsets of the training data categorized according to the Saffir-Simpson wind speed scale prescribed by the meterologists. We evaluated the performance of both models against a deep learning benchmark model called \textit{Deepti} using a publicly available cyclone image dataset. Results indicate that both the global model (with a root mean square error (RMSE) of 9.03 knots) and the distributed model (with a RMSE of 9.3 knots) outperform the benchmark model (with a RMSE of 13.62 knots). We provide a thorough discussion of our solution approach, including an explanantion of the AlexNet's performance using gradient class activation maps (grad-CAM). Our proposed solution strategy allows future experimentation with various deep learning models in both single and multi-channel settings.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Automated Clinical Coding for Outpatient Departments
Authors:
Viktor Schlegel,
Abhinav Ramesh Kashyap,
Thanh-Tung Nguyen,
Tsung-Han Yang,
Vijay Prakash Dwivedi,
Wei-Hsian Yin,
Jeng Wei,
Stefan Winkler
Abstract:
Computerised clinical coding approaches aim to automate the process of assigning a set of codes to medical records. While there is active research pushing the state of the art on clinical coding for hospitalized patients, the outpatient setting -- where doctors tend to non-hospitalised patients -- is overlooked. Although both settings can be formalised as a multi-label classification task, they pr…
▽ More
Computerised clinical coding approaches aim to automate the process of assigning a set of codes to medical records. While there is active research pushing the state of the art on clinical coding for hospitalized patients, the outpatient setting -- where doctors tend to non-hospitalised patients -- is overlooked. Although both settings can be formalised as a multi-label classification task, they present unique and distinct challenges, which raises the question of whether the success of inpatient clinical coding approaches translates to the outpatient setting. This paper is the first to investigate how well state-of-the-art deep learning-based clinical coding approaches work in the outpatient setting at hospital scale. To this end, we collect a large outpatient dataset comprising over 7 million notes documenting over half a million patients. We adapt four state-of-the-art clinical coding approaches to this setting and evaluate their potential to assist coders. We find evidence that clinical coding in outpatient settings can benefit from more innovations in popular inpatient coding benchmarks. A deeper analysis of the factors contributing to the success -- amount and form of data and choice of document representation -- reveals the presence of easy-to-solve examples, the coding of which can be completely automated with a low error rate.
△ Less
Submitted 24 December, 2023; v1 submitted 20 December, 2023;
originally announced December 2023.
-
Graph Transformers for Large Graphs
Authors:
Vijay Prakash Dwivedi,
Yozen Liu,
Anh Tuan Luu,
Xavier Bresson,
Neil Shah,
Tong Zhao
Abstract:
Transformers have recently emerged as powerful neural networks for graph learning, showcasing state-of-the-art performance on several graph property prediction tasks. However, these results have been limited to small-scale graphs, where the computational feasibility of the global attention mechanism is possible. The next goal is to scale up these architectures to handle very large graphs on the sc…
▽ More
Transformers have recently emerged as powerful neural networks for graph learning, showcasing state-of-the-art performance on several graph property prediction tasks. However, these results have been limited to small-scale graphs, where the computational feasibility of the global attention mechanism is possible. The next goal is to scale up these architectures to handle very large graphs on the scale of millions or even billions of nodes. With large-scale graphs, global attention learning is proven impractical due to its quadratic complexity w.r.t. the number of nodes. On the other hand, neighborhood sampling techniques become essential to manage large graph sizes, yet finding the optimal trade-off between speed and accuracy with sampling techniques remains challenging. This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints for developing scalable graph transformer (GT) architectures. We argue such GT requires layers that can adeptly learn both local and global graph representations while swiftly sampling the graph topology. As such, a key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism that encompasses a 4-hop reception field, but achieved through just 2-hop operations. This local node embedding is then integrated with a global node embedding, acquired via another self-attention layer with an approximate global codebook, before finally sent through a downstream layer for node predictions. The proposed GT framework, named LargeGT, overcomes previous computational bottlenecks and is validated on three large-scale node classification benchmarks. We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-papers100M with a 5.9% performance improvement.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
PICS in Pics: Physics Informed Contour Selection for Rapid Image Segmentation
Authors:
Vikas Dwivedi,
Balaji Srinivasan,
Ganapathy Krishnamurthi
Abstract:
Effective training of deep image segmentation models is challenging due to the need for abundant, high-quality annotations. Generating annotations is laborious and time-consuming for human experts, especially in medical image segmentation. To facilitate image annotation, we introduce Physics Informed Contour Selection (PICS) - an interpretable, physics-informed algorithm for rapid image segmentati…
▽ More
Effective training of deep image segmentation models is challenging due to the need for abundant, high-quality annotations. Generating annotations is laborious and time-consuming for human experts, especially in medical image segmentation. To facilitate image annotation, we introduce Physics Informed Contour Selection (PICS) - an interpretable, physics-informed algorithm for rapid image segmentation without relying on labeled data. PICS draws inspiration from physics-informed neural networks (PINNs) and an active contour model called snake. It is fast and computationally lightweight because it employs cubic splines instead of a deep neural network as a basis function. Its training parameters are physically interpretable because they directly represent control knots of the segmentation curve. Traditional snakes involve minimization of the edge-based loss functionals by deriving the Euler-Lagrange equation followed by its numerical solution. However, PICS directly minimizes the loss functional, bypassing the Euler Lagrange equations. It is the first snake variant to minimize a region-based loss function instead of traditional edge-based loss functions. PICS uniquely models the three-dimensional (3D) segmentation process with an unsteady partial differential equation (PDE), which allows accelerated segmentation via transfer learning. To demonstrate its effectiveness, we apply PICS for 3D segmentation of the left ventricle on a publicly available cardiac dataset. While doing so, we also introduce a new convexity-preserving loss term that encodes the shape information of the left ventricle to enhance PICS's segmentation quality. Overall, PICS presents several novelties in network architecture, transfer learning, and physics-inspired losses for image segmentation, thereby showing promising outcomes and potential for further refinement.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
Union Subgraph Neural Networks
Authors:
Jiaxing Xu,
Aihu Zhang,
Qingtian Bian,
Vijay Prakash Dwivedi,
Yiping Ke
Abstract:
Graph Neural Networks (GNNs) are widely used for graph representation learning in many application domains. The expressiveness of vanilla GNNs is upper-bounded by 1-dimensional Weisfeiler-Leman (1-WL) test as they operate on rooted subtrees through iterative message passing. In this paper, we empower GNNs by injecting neighbor-connectivity information extracted from a new type of substructure. We…
▽ More
Graph Neural Networks (GNNs) are widely used for graph representation learning in many application domains. The expressiveness of vanilla GNNs is upper-bounded by 1-dimensional Weisfeiler-Leman (1-WL) test as they operate on rooted subtrees through iterative message passing. In this paper, we empower GNNs by injecting neighbor-connectivity information extracted from a new type of substructure. We first investigate different kinds of connectivities existing in a local neighborhood and identify a substructure called union subgraph, which is able to capture the complete picture of the 1-hop neighborhood of an edge. We then design a shortest-path-based substructure descriptor that possesses three nice properties and can effectively encode the high-order connectivities in union subgraphs. By infusing the encoded neighbor connectivities, we propose a novel model, namely Union Subgraph Neural Network (UnionSNN), which is proven to be strictly more powerful than 1-WL in distinguishing non-isomorphic graphs. Additionally, the local encoding from union subgraphs can also be injected into arbitrary message-passing neural networks (MPNNs) and Transformer-based models as a plugin. Extensive experiments on 18 benchmarks of both graph-level and node-level tasks demonstrate that UnionSNN outperforms state-of-the-art baseline models, with competitive computational efficiency. The injection of our local encoding to existing models is able to boost the performance by up to 11.09%. Our code is available at https://github.com/AngusMonroe/UnionSNN.
△ Less
Submitted 9 January, 2024; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Long Range Graph Benchmark
Authors:
Vijay Prakash Dwivedi,
Ladislav Rampášek,
Mikhail Galkin,
Ali Parviz,
Guy Wolf,
Anh Tuan Luu,
Dominique Beaini
Abstract:
Graph Neural Networks (GNNs) that are based on the message passing (MP) paradigm generally exchange information between 1-hop neighbors to build node representations at each layer. In principle, such networks are not able to capture long-range interactions (LRI) that may be desired or necessary for learning a given task on graphs. Recently, there has been an increasing interest in development of T…
▽ More
Graph Neural Networks (GNNs) that are based on the message passing (MP) paradigm generally exchange information between 1-hop neighbors to build node representations at each layer. In principle, such networks are not able to capture long-range interactions (LRI) that may be desired or necessary for learning a given task on graphs. Recently, there has been an increasing interest in development of Transformer-based methods for graphs that can consider full node connectivity beyond the original sparse structure, thus enabling the modeling of LRI. However, MP-GNNs that simply rely on 1-hop message passing often fare better in several existing graph benchmarks when combined with positional feature representations, among other innovations, hence limiting the perceived utility and ranking of Transformer-like architectures. Here, we present the Long Range Graph Benchmark (LRGB) with 5 graph learning datasets: PascalVOC-SP, COCO-SP, PCQM-Contact, Peptides-func and Peptides-struct that arguably require LRI reasoning to achieve strong performance in a given task. We benchmark both baseline GNNs and Graph Transformer networks to verify that the models which capture long-range dependencies perform significantly better on these tasks. Therefore, these datasets are suitable for benchmarking and exploration of MP-GNNs and Graph Transformer architectures that are intended to capture LRI.
△ Less
Submitted 28 November, 2023; v1 submitted 16 June, 2022;
originally announced June 2022.
-
Recipe for a General, Powerful, Scalable Graph Transformer
Authors:
Ladislav Rampášek,
Mikhail Galkin,
Vijay Prakash Dwivedi,
Anh Tuan Luu,
Guy Wolf,
Dominique Beaini
Abstract:
We propose a recipe on how to build a general, powerful, scalable (GPS) graph Transformer with linear complexity and state-of-the-art results on a diverse set of benchmarks. Graph Transformers (GTs) have gained popularity in the field of graph representation learning with a variety of recent publications but they lack a common foundation about what constitutes a good positional or structural encod…
▽ More
We propose a recipe on how to build a general, powerful, scalable (GPS) graph Transformer with linear complexity and state-of-the-art results on a diverse set of benchmarks. Graph Transformers (GTs) have gained popularity in the field of graph representation learning with a variety of recent publications but they lack a common foundation about what constitutes a good positional or structural encoding, and what differentiates them. In this paper, we summarize the different types of encodings with a clearer definition and categorize them as being $\textit{local}$, $\textit{global}$ or $\textit{relative}$. The prior GTs are constrained to small graphs with a few hundred nodes, here we propose the first architecture with a complexity linear in the number of nodes and edges $O(N+E)$ by decoupling the local real-edge aggregation from the fully-connected Transformer. We argue that this decoupling does not negatively affect the expressivity, with our architecture being a universal function approximator on graphs. Our GPS recipe consists of choosing 3 main ingredients: (i) positional/structural encoding, (ii) local message-passing mechanism, and (iii) global attention mechanism. We provide a modular framework $\textit{GraphGPS}$ that supports multiple types of encodings and that provides efficiency and scalability both in small and large graphs. We test our architecture on 16 benchmarks and show highly competitive results in all of them, show-casing the empirical benefits gained by the modularity and the combination of different strategies.
△ Less
Submitted 15 January, 2023; v1 submitted 24 May, 2022;
originally announced May 2022.
-
Numerical Approximation in CFD Problems Using Physics Informed Machine Learning
Authors:
Siddharth Rout,
Vikas Dwivedi,
Balaji Srinivasan
Abstract:
The thesis focuses on various techniques to find an alternate approximation method that could be universally used for a wide range of CFD problems but with low computational cost and low runtime. Various techniques have been explored within the field of machine learning to gauge the utility in fulfilling the core ambition. Steady advection diffusion problem has been used as the test case to unders…
▽ More
The thesis focuses on various techniques to find an alternate approximation method that could be universally used for a wide range of CFD problems but with low computational cost and low runtime. Various techniques have been explored within the field of machine learning to gauge the utility in fulfilling the core ambition. Steady advection diffusion problem has been used as the test case to understand the level of complexity up to which a method can provide solution. Ultimately, the focus stays over physics informed machine learning techniques where solving differential equations is possible without any training with computed data. The prevalent methods by I.E. Lagaris et.al. and M. Raissi et.al are explored thoroughly. The prevalent methods cannot solve advection dominant problems. A physics informed method, called as Distributed Physics Informed Neural Network (DPINN), is proposed to solve advection dominant problems. It increases the lexibility and capability of older methods by splitting the domain and introducing other physics-based constraints as mean squared loss terms. Various experiments are done to explore the end to end possibilities with the method. Parametric study is also done to understand the behavior of the method to different tunable parameters. The method is tested over steady advection-diffusion problems and unsteady square pulse problems. Very accurate results are recorded. Extreme learning machine (ELM) is a very fast neural network algorithm at the cost of tunable parameters. The ELM based variant of the proposed model is tested over the advection-diffusion problem. ELM makes the complex optimization simpler and Since the method is non-iterative, the solution is recorded in a single shot. The ELM based variant seems to work better than the simple DPINN method. Simultaneously scope for various development in future are hinted throughout the thesis.
△ Less
Submitted 1 November, 2021;
originally announced November 2021.
-
Graph Neural Networks with Learnable Structural and Positional Representations
Authors:
Vijay Prakash Dwivedi,
Anh Tuan Luu,
Thomas Laurent,
Yoshua Bengio,
Xavier Bresson
Abstract:
Graph neural networks (GNNs) have become the standard learning architectures for graphs. GNNs have been applied to numerous domains ranging from quantum chemistry, recommender systems to knowledge graphs and natural language processing. A major issue with arbitrary graphs is the absence of canonical positional information of nodes, which decreases the representation power of GNNs to distinguish e.…
▽ More
Graph neural networks (GNNs) have become the standard learning architectures for graphs. GNNs have been applied to numerous domains ranging from quantum chemistry, recommender systems to knowledge graphs and natural language processing. A major issue with arbitrary graphs is the absence of canonical positional information of nodes, which decreases the representation power of GNNs to distinguish e.g. isomorphic nodes and other graph symmetries. An approach to tackle this issue is to introduce Positional Encoding (PE) of nodes, and inject it into the input layer, like in Transformers. Possible graph PE are Laplacian eigenvectors. In this work, we propose to decouple structural and positional representations to make easy for the network to learn these two essential properties. We introduce a novel generic architecture which we call LSPE (Learnable Structural and Positional Encodings). We investigate several sparse and fully-connected (Transformer-like) GNNs, and observe a performance increase for molecular datasets, from 1.79% up to 64.14% when considering learnable PE for both GNN classes.
△ Less
Submitted 10 February, 2022; v1 submitted 15 October, 2021;
originally announced October 2021.
-
A Generalization of Transformer Networks to Graphs
Authors:
Vijay Prakash Dwivedi,
Xavier Bresson
Abstract:
We propose a generalization of transformer neural network architecture for arbitrary graphs. The original transformer was designed for Natural Language Processing (NLP), which operates on fully connected graphs representing all connections between the words in a sequence. Such architecture does not leverage the graph connectivity inductive bias, and can perform poorly when the graph topology is im…
▽ More
We propose a generalization of transformer neural network architecture for arbitrary graphs. The original transformer was designed for Natural Language Processing (NLP), which operates on fully connected graphs representing all connections between the words in a sequence. Such architecture does not leverage the graph connectivity inductive bias, and can perform poorly when the graph topology is important and has not been encoded into the node features. We introduce a graph transformer with four new properties compared to the standard model. First, the attention mechanism is a function of the neighborhood connectivity for each node in the graph. Second, the positional encoding is represented by the Laplacian eigenvectors, which naturally generalize the sinusoidal positional encodings often used in NLP. Third, the layer normalization is replaced by a batch normalization layer, which provides faster training and better generalization performance. Finally, the architecture is extended to edge feature representation, which can be critical to tasks s.a. chemistry (bond type) or link prediction (entity relationship in knowledge graphs). Numerical experiments on a graph benchmark demonstrate the performance of the proposed graph transformer architecture. This work closes the gap between the original transformer, which was designed for the limited case of line graphs, and graph neural networks, that can work with arbitrary graphs. As our architecture is simple and generic, we believe it can be used as a black box for future applications that wish to consider transformer and graphs.
△ Less
Submitted 24 January, 2021; v1 submitted 17 December, 2020;
originally announced December 2020.
-
Impact of RF I/Q Imbalance on Interference-Limited Mixed RF/FSO TWR Systems with Non-Zero Boresight Error
Authors:
Abhijeet Upadhya,
Juhi Gupta,
Vivek K. Dwivedi,
Mohamed-Slim Alouini
Abstract:
In this letter, we investigate a generic model assessing the effect of in-phase/quadrature-phase imbalance (IQI) on an asymmetric dual hop radio frequency/free space optical (RF/FSO) two-way relay (TWR) system in the presence of multiple co-channel interferers (CCIs) at the relay. The fading on the RF and FSO links have been modeled using K-distribution and double generalized Gamma (D-GG) turbulen…
▽ More
In this letter, we investigate a generic model assessing the effect of in-phase/quadrature-phase imbalance (IQI) on an asymmetric dual hop radio frequency/free space optical (RF/FSO) two-way relay (TWR) system in the presence of multiple co-channel interferers (CCIs) at the relay. The fading on the RF and FSO links have been modeled using K-distribution and double generalized Gamma (D-GG) turbulence model, respectively. The impact of non-zero boresight pointing error and type of optical demodulation schemes have been considered on the FSO link. To this end, a closed-form probability density function (PDF) has been derived for the FSO link undergoing D-GG irradiance with non-zero boresight pointing error. Furthermore, the exact and high signal-to-noise (SNR) asymptotic expression for outage probability have been presented. To gain insights into the throughput of the system, approximate and asymptotic expressions for the achievable sum rate (ASR) have been derived. The results show dependency of reliability and throughput offered on IQI at the RF front-end and strength of interference for the considered TWR system.
△ Less
Submitted 8 November, 2020;
originally announced November 2020.
-
Benchmarking Graph Neural Networks
Authors:
Vijay Prakash Dwivedi,
Chaitanya K. Joshi,
Anh Tuan Luu,
Thomas Laurent,
Yoshua Bengio,
Xavier Bresson
Abstract:
In the last few years, graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. This emerging field has witnessed an extensive growth of promising techniques that have been applied with success to computer science, mathematics, biology, physics and chemistry. But for any successful field to become mainstream and reliable, benchmarks must be deve…
▽ More
In the last few years, graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. This emerging field has witnessed an extensive growth of promising techniques that have been applied with success to computer science, mathematics, biology, physics and chemistry. But for any successful field to become mainstream and reliable, benchmarks must be developed to quantify progress. This led us in March 2020 to release a benchmark framework that i) comprises of a diverse collection of mathematical and real-world graphs, ii) enables fair model comparison with the same parameter budget to identify key architectures, iii) has an open-source, easy-to-use and reproducible code infrastructure, and iv) is flexible for researchers to experiment with new theoretical ideas. As of December 2022, the GitHub repository has reached 2,000 stars and 380 forks, which demonstrates the utility of the proposed open-source framework through the wide usage by the GNN community. In this paper, we present an updated version of our benchmark with a concise presentation of the aforementioned framework characteristics, an additional medium-sized molecular dataset AQSOL, similar to the popular ZINC, but with a real-world measured chemical target, and discuss how this framework can be leveraged to explore new GNN designs and insights. As a proof of value of our benchmark, we study the case of graph positional encoding (PE) in GNNs, which was introduced with this benchmark and has since spurred interest of exploring more powerful PE for Transformers and GNNs in a robust experimental setting.
△ Less
Submitted 27 December, 2022; v1 submitted 2 March, 2020;
originally announced March 2020.
-
Distributed physics informed neural network for data-efficient solution to partial differential equations
Authors:
Vikas Dwivedi,
Nishant Parashar,
Balaji Srinivasan
Abstract:
The physics informed neural network (PINN) is evolving as a viable method to solve partial differential equations. In the recent past PINNs have been successfully tested and validated to find solutions to both linear and non-linear partial differential equations (PDEs). However, the literature lacks detailed investigation of PINNs in terms of their representation capability. In this work, we first…
▽ More
The physics informed neural network (PINN) is evolving as a viable method to solve partial differential equations. In the recent past PINNs have been successfully tested and validated to find solutions to both linear and non-linear partial differential equations (PDEs). However, the literature lacks detailed investigation of PINNs in terms of their representation capability. In this work, we first test the original PINN method in terms of its capability to represent a complicated function. Further, to address the shortcomings of the PINN architecture, we propose a novel distributed PINN, named DPINN. We first perform a direct comparison of the proposed DPINN approach against PINN to solve a non-linear PDE (Burgers' equation). We show that DPINN not only yields a more accurate solution to the Burgers' equation, but it is found to be more data-efficient as well. At last, we employ our novel DPINN to two-dimensional steady-state Navier-Stokes equation, which is a system of non-linear PDEs. To the best of the authors' knowledge, this is the first such attempt to directly solve the Navier-Stokes equation using a physics informed neural network.
△ Less
Submitted 21 July, 2019;
originally announced July 2019.
-
Physics Informed Extreme Learning Machine (PIELM) -- A rapid method for the numerical solution of partial differential equations
Authors:
Vikas Dwivedi,
Balaji Srinivasan
Abstract:
There has been rapid progress recently on the application of deep networks to the solution of partial differential equations, collectively labelled as Physics Informed Neural Networks (PINNs). In this paper, we develop Physics Informed Extreme Learning Machine (PIELM), a rapid version of PINNs which can be applied to stationary and time dependent linear partial differential equations. We demonstra…
▽ More
There has been rapid progress recently on the application of deep networks to the solution of partial differential equations, collectively labelled as Physics Informed Neural Networks (PINNs). In this paper, we develop Physics Informed Extreme Learning Machine (PIELM), a rapid version of PINNs which can be applied to stationary and time dependent linear partial differential equations. We demonstrate that PIELM matches or exceeds the accuracy of PINNs on a range of problems. We also discuss the limitations of neural network based approaches, including our PIELM, in the solution of PDEs on large domains and suggest an extension, a distributed version of our algorithm -{}- DPIELM. We show that DPIELM produces excellent results comparable to conventional numerical techniques in the solution of time-dependent problems. Collectively, this work contributes towards making the use of neural networks in the solution of partial differential equations in complex domains as a competitive alternative to conventional discretization techniques.
△ Less
Submitted 8 July, 2019;
originally announced July 2019.
-
A Watermarking Technique Using Discrete Curvelet Transform for Security of Multiple Biometric Features
Authors:
Rohit M. Thanki,
Ved Vyas Dwivedi,
Komal R. Borisagar
Abstract:
The robustness and security of the biometric watermarking approach can be improved by using a multiple watermarking. This multiple watermarking proposed for improving security of biometric features and data. When the imposter tries to create the spoofed biometric feature, the invisible biometric watermark features can provide appropriate protection to multimedia data. In this paper, a biometric wa…
▽ More
The robustness and security of the biometric watermarking approach can be improved by using a multiple watermarking. This multiple watermarking proposed for improving security of biometric features and data. When the imposter tries to create the spoofed biometric feature, the invisible biometric watermark features can provide appropriate protection to multimedia data. In this paper, a biometric watermarking technique with multiple biometric watermarks are proposed in which biometric features of fingerprint, face, iris and signature is embedded in the image. Before embedding, fingerprint, iris, face and signature features are extracted using Shen-Castan edge detection and Principal Component Analysis. These all biometric watermark features are embedded into various mid band frequency curvelet coefficients of host image. All four fingerprint features, iris features, facial features and signature features are the biometric characteristics of the individual and they are used for cross verification and copyright protection if any manipulation occurs. The proposed technique is fragile enough; features cannot be extracted from the watermarked image when an imposter tries to remove watermark features illegally. It can use for multiple copyright authentication and verification.
△ Less
Submitted 16 January, 2017;
originally announced January 2017.
-
Foundations and Tools for End-User Architecting
Authors:
David Garlan,
Vishal Dwivedi,
Ivan Ruchkin,
Bradley Schmerl
Abstract:
Within an increasing number of domains an important emerging need is the ability for technically naive users to compose computational elements into novel configurations. Examples include astronomers who create new analysis pipelines to process telescopic data, intelligence analysts who must process diverse sources of unstructured text to discover socio-technical trends, and medical researchers who…
▽ More
Within an increasing number of domains an important emerging need is the ability for technically naive users to compose computational elements into novel configurations. Examples include astronomers who create new analysis pipelines to process telescopic data, intelligence analysts who must process diverse sources of unstructured text to discover socio-technical trends, and medical researchers who have to process brain image data in new ways to understand disease pathways. Creating such compositions today typically requires low-level technical expertise, limiting the use of computational methods and increasing the cost of using them. In this paper we describe an approach - which we term end-user architecting - that exploits the similarity between such compositional activities and those of software architects. Drawing on the rich heritage of software architecture languages, methods, and tools, we show how those techniques can be adapted to support end users in composing rich computational systems through domain-specific compositional paradigms and component repositories, without requiring that they have knowledge of the low-level implementation details of the components or the compositional infrastructure. Further, we outline a set of open research challenges that the area of end-user architecting raises.
△ Less
Submitted 17 October, 2012;
originally announced October 2012.