Search | arXiv e-print repository

RAE: A Rule-Driven Approach for Attribute Embedding in Property Graph Recommendation

Authors: Sibo Zhao, Michael Bewong, Selasi Kwashie, Junwei Hu, Zaiwen Feng

Abstract: Recommendation systems are crucial in modern applications to enhance the user experience and drive business conversion rates through personalization. However, insufficient utilization of attribute information within the property graph remains a significant challenge. Most existing graph convolutional network (GCN) models do not consider attribute information, and those that do often employ a simpl… ▽ More Recommendation systems are crucial in modern applications to enhance the user experience and drive business conversion rates through personalization. However, insufficient utilization of attribute information within the property graph remains a significant challenge. Most existing graph convolutional network (GCN) models do not consider attribute information, and those that do often employ a simplified triple format <users, items, attributes>, which fails to fully exploit the rich semantic structures of property graphs necessary for effective recommendations. To overcome these limitations, we introduce Rule-Driven Approach for Attribute Embedding (RAE), a novel methodology that enhances recommendation performance by effectively mining and utilizing semantic rules from property graphs. RAE applies a rule-mining process to extract meaningful rules that guide random walks in generating enriched attribute embeddings. These enriched embeddings are subsequently integrated into GCNs, surpassing conventional triple-based embedding techniques. We evaluate RAE on real-world datasets (e.g., Blogcatalog and Flickr) and demonstrate that RAE achieves an average improvement of 10.6% in both Recall@20 and NDCG@20 compared to state-of-the-art baselines, indicating superior relevance coverage and ranking rationality in top-20 recommendations. Additionally, RAE exhibits enhanced robustness against data sparsity and the attribute missingness problem. Our novel approach underscores the significant performance gains achieved in recommendation systems by fully leveraging attribute information within property graphs, enhancing both effectiveness and reliability. △ Less

Submitted 26 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

Comments: ECML-PKDD2025

arXiv:2504.01557 [pdf, other]

FastER: Fast On-Demand Entity Resolution in Property Graphs

Authors: Shujing Wang, Selasi Kwashie, Michael Bewong, Junwei Hu, Vincent M. Nofong, Shiqi Miao, Zaiwen Feng

Abstract: Entity resolution (ER) is the problem of identifying and linking database records that refer to the same real-world entity. Traditional ER methods use batch processing, which becomes impractical with growing data volumes due to high computational costs and lack of real-time capabilities. In many applications, users need to resolve entities for only a small portion of their data, making full data p… ▽ More Entity resolution (ER) is the problem of identifying and linking database records that refer to the same real-world entity. Traditional ER methods use batch processing, which becomes impractical with growing data volumes due to high computational costs and lack of real-time capabilities. In many applications, users need to resolve entities for only a small portion of their data, making full data processing unnecessary -- a scenario known as "ER-on-demand". This paper proposes FastER, an efficient ER-on-demand framework for property graphs. Our approach uses graph differential dependencies (GDDs) as a knowledge encoding language to design effective filtering mechanisms that leverage both structural and attribute semantics of graphs. We construct a blocking graph from filtered subgraphs to reduce the number of candidate entity pairs requiring comparison. Additionally, FastER incorporates Progressive Profile Scheduling (PPS), allowing the system to incrementally produce results throughout the resolution process. Extensive evaluations on multiple benchmark datasets demonstrate that FastER significantly outperforms state-of-the-art ER methods in computational efficiency and real-time processing for on-demand tasks while ensuring reliability. We make FastER publicly available at: https://anonymous.4open.science/r/On_Demand_Entity_Resolution-9DFB △ Less

Submitted 2 April, 2025; originally announced April 2025.

arXiv:2411.00801 [pdf, other]

A Heterogeneous Network-based Contrastive Learning Approach for Predicting Drug-Target Interaction

Authors: Junwei Hu, Michael Bewong, Selasi Kwashie, Wen Zhang, Vincent M. Nofong, Guangsheng Wu, Zaiwen Feng

Abstract: Drug-target interaction (DTI) prediction is crucial for drug development and repositioning. Methods using heterogeneous graph neural networks (HGNNs) for DTI prediction have become a promising approach, with attention-based models often achieving excellent performance. However, these methods typically overlook edge features when dealing with heterogeneous biomedical networks. We propose a heteroge… ▽ More Drug-target interaction (DTI) prediction is crucial for drug development and repositioning. Methods using heterogeneous graph neural networks (HGNNs) for DTI prediction have become a promising approach, with attention-based models often achieving excellent performance. However, these methods typically overlook edge features when dealing with heterogeneous biomedical networks. We propose a heterogeneous network-based contrastive learning method called HNCL-DTI, which designs a heterogeneous graph attention network to predict potential/novel DTIs. Specifically, our HNCL-DTI utilizes contrastive learning to collaboratively learn node representations from the perspective of both node-based and edge-based attention within the heterogeneous structure of biomedical networks. Experimental results show that HNCL-DTI outperforms existing advanced baseline methods on benchmark datasets, demonstrating strong predictive ability and practical effectiveness. The data and source code are available at https://github.com/Zaiwen/HNCL-DTI. △ Less

Submitted 20 October, 2024; originally announced November 2024.

arXiv:2410.15747 [pdf, other]

GIG: Graph Data Imputation With Graph Differential Dependencies

Authors: Jiang Hua, Michael Bewong, Selasi Kwashie, MD Geaur Rahman, Junwei Hu, Xi Guo, Zaiwen Fen

Abstract: Data imputation addresses the challenge of imputing missing values in database instances, ensuring consistency with the overall semantics of the dataset. Although several heuristics which rely on statistical methods, and ad-hoc rules have been proposed. These do not generalise well and often lack data context. Consequently, they also lack explainability. The existing techniques also mostly focus o… ▽ More Data imputation addresses the challenge of imputing missing values in database instances, ensuring consistency with the overall semantics of the dataset. Although several heuristics which rely on statistical methods, and ad-hoc rules have been proposed. These do not generalise well and often lack data context. Consequently, they also lack explainability. The existing techniques also mostly focus on the relational data context making them unsuitable for wider application contexts such as in graph data. In this paper, we propose a graph data imputation approach called GIG which relies on graph differential dependencies (GDDs). GIG, learns the GDDs from a given knowledge graph, and uses these rules to train a transformer model which then predicts the value of missing data within the graph. By leveraging GDDs, GIG incoporates semantic knowledge into the data imputation process making it more reliable and explainable. Experimental results on seven real-world datasets highlight GIG's effectiveness compared to existing state-of-the-art approaches. △ Less

Submitted 21 October, 2024; originally announced October 2024.

Comments: 12 pages, 4 figures, published to ADC

arXiv:2410.04783 [pdf, other]

When GDD meets GNN: A Knowledge-driven Neural Connection for Effective Entity Resolution in Property Graphs

Authors: Junwei Hu, Michael Bewong, Selasi Kwashie, Yidi Zhang, Vincent Nofong, John Wondoh, Zaiwen Feng

Abstract: This paper studies the entity resolution (ER) problem in property graphs. ER is the task of identifying and linking different records that refer to the same real-world entity. It is commonly used in data integration, data cleansing, and other applications where it is important to have accurate and consistent data. In general, two predominant approaches exist in the literature: rule-based and learn… ▽ More This paper studies the entity resolution (ER) problem in property graphs. ER is the task of identifying and linking different records that refer to the same real-world entity. It is commonly used in data integration, data cleansing, and other applications where it is important to have accurate and consistent data. In general, two predominant approaches exist in the literature: rule-based and learning-based methods. On the one hand, rule-based techniques are often desired due to their explainability and ability to encode domain knowledge. Learning-based methods, on the other hand, are preferred due to their effectiveness in spite of their black-box nature. In this work, we devise a hybrid ER solution, GraphER, that leverages the strengths of both systems for property graphs. In particular, we adopt graph differential dependency (GDD) for encoding the so-called record-matching rules, and employ them to guide a graph neural network (GNN) based representation learning for the task. We conduct extensive empirical evaluation of our proposal on benchmark ER datasets including 17 graph datasets and 7 relational datasets in comparison with 10 state-of-the-art (SOTA) techniques. The results show that our approach provides a significantly better solution to addressing ER in graph data, both quantitatively and qualitatively, while attaining highly competitive results on the benchmark relational datasets w.r.t. the SOTA solutions. △ Less

Submitted 7 October, 2024; originally announced October 2024.

arXiv:2409.12428 [pdf, other]

Is it Still Fair? A Comparative Evaluation of Fairness Algorithms through the Lens of Covariate Drift

Authors: Oscar Blessed Deho, Michael Bewong, Selasi Kwashie, Jiuyong Li, Jixue Liu, Lin Liu, Srecko Joksimovic

Abstract: Over the last few decades, machine learning (ML) applications have grown exponentially, yielding several benefits to society. However, these benefits are tempered with concerns of discriminatory behaviours exhibited by ML models. In this regard, fairness in machine learning has emerged as a priority research area. Consequently, several fairness metrics and algorithms have been developed to mitigat… ▽ More Over the last few decades, machine learning (ML) applications have grown exponentially, yielding several benefits to society. However, these benefits are tempered with concerns of discriminatory behaviours exhibited by ML models. In this regard, fairness in machine learning has emerged as a priority research area. Consequently, several fairness metrics and algorithms have been developed to mitigate against discriminatory behaviours that ML models may possess. Yet still, very little attention has been paid to the problem of naturally occurring changes in data patterns (\textit{aka} data distributional drift), and its impact on fairness algorithms and metrics. In this work, we study this problem comprehensively by analyzing 4 fairness-unaware baseline algorithms and 7 fairness-aware algorithms, carefully curated to cover the breadth of its typology, across 5 datasets including public and proprietary data, and evaluated them using 3 predictive performance and 10 fairness metrics. In doing so, we show that (1) data distributional drift is not a trivial occurrence, and in several cases can lead to serious deterioration of fairness in so-called fair models; (2) contrary to some existing literature, the size and direction of data distributional drift is not correlated to the resulting size and direction of unfairness; and (3) choice of, and training of fairness algorithms is impacted by the effect of data distributional drift which is largely ignored in the literature. Emanating from our findings, we synthesize several policy implications of data distributional drift on fairness algorithms that can be very relevant to stakeholders and practitioners. △ Less

Submitted 18 September, 2024; originally announced September 2024.

arXiv:2409.08522 [pdf, other]

MAPX: An explainable model-agnostic framework for the detection of false information on social media networks

Authors: Sarah Condran, Michael Bewong, Selasi Kwashie, Md Zahidul Islam, Irfan Altas, Joshua Condran

Abstract: The automated detection of false information has become a fundamental task in combating the spread of "fake news" on online social media networks (OSMN) as it reduces the need for manual discernment by individuals. In the literature, leveraging various content or context features of OSMN documents have been found useful. However, most of the existing detection models often utilise these features i… ▽ More The automated detection of false information has become a fundamental task in combating the spread of "fake news" on online social media networks (OSMN) as it reduces the need for manual discernment by individuals. In the literature, leveraging various content or context features of OSMN documents have been found useful. However, most of the existing detection models often utilise these features in isolation without regard to the temporal and dynamic changes oft-seen in reality, thus, limiting the robustness of the models. Furthermore, there has been little to no consideration of the impact of the quality of documents' features on the trustworthiness of the final prediction. In this paper, we introduce a novel model-agnostic framework, called MAPX, which allows evidence based aggregation of predictions from existing models in an explainable manner. Indeed, the developed aggregation method is adaptive, dynamic and considers the quality of OSMN document features. Further, we perform extensive experiments on benchmarked fake news datasets to demonstrate the effectiveness of MAPX using various real-world data quality scenarios. Our empirical results show that the proposed framework consistently outperforms all state-of-the-art models evaluated. For reproducibility, a demo of MAPX is available at \href{https://github.com/SCondran/MAPX_framework}{this link} △ Less

Submitted 12 September, 2024; originally announced September 2024.

Comments: 16 pages, 5 figures

arXiv:2304.02323 [pdf, other]

FASTAGEDS: Fast Approximate Graph Entity Dependency Discovery

Authors: Guangtong Zhou, Selasi Kwashie, Yidi Zhang, Michael Bewong, Vincent M. Nofong, Debo Cheng, Keqing He, Zaiwen Feng

Abstract: This paper studies the discovery of approximate rules in property graphs. We propose a semantically meaningful measure of error for mining graph entity dependencies (GEDs) at almost hold, to tolerate errors and inconsistencies that exist in real-world graphs. We present a new characterisation of GED satisfaction, and devise a depth-first search strategy to traverse the search space of candidate ru… ▽ More This paper studies the discovery of approximate rules in property graphs. We propose a semantically meaningful measure of error for mining graph entity dependencies (GEDs) at almost hold, to tolerate errors and inconsistencies that exist in real-world graphs. We present a new characterisation of GED satisfaction, and devise a depth-first search strategy to traverse the search space of candidate rules efficiently. Further, we perform experiments to demonstrate the feasibility and scalability of our solution, FASTAGEDS, with three real-world graphs. △ Less

Submitted 8 April, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

Comments: 7 pages, 5 figures. arXiv admin note: text overlap with arXiv:2301.06264

arXiv:2301.06264 [pdf, other]

An Efficient Approach for Discovering Graph Entity Dependencies (GEDs)

Authors: Dehua Liu, Selasi Kwashie, Yidi Zhang, Guangtong Zhou, Michael Bewong, Xiaoying Wu, Xi Guo, Keqing He, Zaiwen Feng

Abstract: Graph entity dependencies (GEDs) are novel graph constraints, unifying keys and functional dependencies, for property graphs. They have been found useful in many real-world data quality and data management tasks, including fact checking on social media networks and entity resolution. In this paper, we study the discovery problem of GEDs -- finding a minimal cover of valid GEDs in a given graph dat… ▽ More Graph entity dependencies (GEDs) are novel graph constraints, unifying keys and functional dependencies, for property graphs. They have been found useful in many real-world data quality and data management tasks, including fact checking on social media networks and entity resolution. In this paper, we study the discovery problem of GEDs -- finding a minimal cover of valid GEDs in a given graph data. We formalise the problem, and propose an effective and efficient approach to overcome major bottlenecks in GED discovery. In particular, we leverage existing graph partitioning algorithms to enable fast GED-scope discovery, and employ effective pruning strategies over the prohibitively large space of candidate dependencies. Furthermore, we define an interestingness measure for GEDs based on the minimum description length principle, to score and rank the mined cover set of GEDs. Finally, we demonstrate the scalability and effectiveness of our GED discovery approach through extensive experiments on real-world benchmark graph data sets; and present the usefulness of the discovered rules in different downstream data quality management applications. △ Less

Submitted 30 June, 2023; v1 submitted 16 January, 2023; originally announced January 2023.

arXiv:1908.04464 [pdf, other]

Linking Graph Entities with Multiplicity and Provenance

Authors: Jixue Liu, Selasi Kwashie, Jiuyong Li, Lin Liu, Michael Bewong

Abstract: Entity linking and resolution is a fundamental database problem with applications in data integration, data cleansing, information retrieval, knowledge fusion, and knowledge-base population. It is the task of accurately identifying multiple, differing, and possibly contradicting representations of the same real-world entity in data. In this work, we propose an entity linking and resolution system… ▽ More Entity linking and resolution is a fundamental database problem with applications in data integration, data cleansing, information retrieval, knowledge fusion, and knowledge-base population. It is the task of accurately identifying multiple, differing, and possibly contradicting representations of the same real-world entity in data. In this work, we propose an entity linking and resolution system capable of linking entities across different databases and mentioned-entities extracted from text data. Our entity linking/resolution solution, called Certus, uses a graph model to represent the profiles of entities. The graph model is versatile, thus, it is capable of handling multiple values for an attribute or a relationship, as well as the provenance descriptions of the values. Provenance descriptions of a value provide the settings of the value, such as validity periods, sources, security requirements, etc. This paper presents the architecture for the entity linking system, the logical, physical, and indexing models used in the system, and the general linking process. Furthermore, we demonstrate the performance of update operations of the physical storage models when the system is implemented in two state-of-the-art database management systems, HBase and Postgres. △ Less

Submitted 25 November, 2019; v1 submitted 12 August, 2019; originally announced August 2019.

Comments: 7 pages, 5 figures

arXiv:1309.3733 [pdf, other]

Discovery of Approximate Differential Dependencies

Authors: Jixue Liu, Selasi Kwashie, Jiuyong Li, Feiyue Ye, Millist Vincent

Abstract: Differential dependencies (DDs) capture the relationships between data columns of relations. They are more general than functional dependencies (FDs) and and the difference is that DDs are defined on the distances between values of two tuples, not directly on the values. Because of this difference, the algorithms for discovering FDs from data find only special DDs, not all DDs and therefore are no… ▽ More Differential dependencies (DDs) capture the relationships between data columns of relations. They are more general than functional dependencies (FDs) and and the difference is that DDs are defined on the distances between values of two tuples, not directly on the values. Because of this difference, the algorithms for discovering FDs from data find only special DDs, not all DDs and therefore are not applicable to DD discovery. In this paper, we propose an algorithm to discover DDs from data following the way of fixing the left hand side of a candidate DD to determine the right hand side. We also show some properties of DDs and conduct a comprehensive analysis on how sampling affects the DDs discovered from data. △ Less

Submitted 15 September, 2013; originally announced September 2013.

Showing 1–11 of 11 results for author: Kwashie, S