Search | arXiv e-print repository

Transformer-based Ranking Approaches for Keyword Queries over Relational Databases

Authors: Paulo Martins, Altigran da Silva, Johny Moreira, Edleno de Moura

Abstract: Relational Keyword Search (R-KwS) systems enable naive/informal users to explore and retrieve information from relational databases without requiring schema knowledge or query-language proficiency. Although numerous R-KwS methods have been proposed, most still focus on queries referring only to attribute values or primarily address performance enhancements, providing limited support for queries re… ▽ More Relational Keyword Search (R-KwS) systems enable naive/informal users to explore and retrieve information from relational databases without requiring schema knowledge or query-language proficiency. Although numerous R-KwS methods have been proposed, most still focus on queries referring only to attribute values or primarily address performance enhancements, providing limited support for queries referencing schema elements. We previously introduced Lathe, a system that accommodates schema-based keyword queries and employs an eager CJN evaluation strategy to filter out spurious Candidate Joining Networks (CJNs). However, Lathe still faces challenges in accurately ranking CJNs when queries are ambiguous. In this work, we propose a new transformer-based ranking approach that provides a more context-aware evaluation of Query Matches (QMs) and CJNs. Our solution introduces a linearization process to convert relational structures into textual sequences suitable for transformer models. It also includes a data augmentation strategy aimed at handling diverse and ambiguous queries more effectively. Experimental results, comparing our transformer-based ranking to Lathe's original Bayesian-based method, show significant improvements in recall and R@k, demonstrating the effectiveness of our neural approach in delivering the most relevant query results. △ Less

Submitted 24 March, 2025; originally announced March 2025.

arXiv:2502.09276 [pdf, other]

doi 10.1016/j.icte.2024.10.009

Transactional Dynamics in Hyperledger Fabric: A Stochastic Modeling and Performance Evaluation of Permissioned Blockchains

Authors: Carlos Melo, Glauber Gonçalves, Francisco Airton Silva, Iure Fé, Ericksulino Moura, André Soares, Eunmi Choi, Dugki Min, Jae-Woo Lee, Tuan Anh Nguyen

Abstract: Blockchain, often integrated with distributed systems and security enhancements, has significant potential in various industries. However, environmental concerns and the efficiency of consortia-controlled permissioned networks remain critical issues. We use a Stochastic Petri Net model to analyze transaction flows in Hyperledger Fabric networks, achieving a 95% confidence interval for response tim… ▽ More Blockchain, often integrated with distributed systems and security enhancements, has significant potential in various industries. However, environmental concerns and the efficiency of consortia-controlled permissioned networks remain critical issues. We use a Stochastic Petri Net model to analyze transaction flows in Hyperledger Fabric networks, achieving a 95% confidence interval for response times. This model enables administrators to assess the impact of system changes on resource utilization. Sensitivity analysis reveals major factors influencing response times and throughput. Our case studies demonstrate that block size can alter throughput and response times by up to 200%, underscoring the need for performance optimization with resource efficiency. △ Less

Submitted 13 February, 2025; originally announced February 2025.

arXiv:2411.14551 [pdf, other]

An Experimental Study on Data Augmentation Techniques for Named Entity Recognition on Low-Resource Domains

Authors: Arthur Elwing Torres, Edleno Silva de Moura, Altigran Soares da Silva, Mario A. Nascimento, Filipe Mesquita

Abstract: Named Entity Recognition (NER) is a machine learning task that traditionally relies on supervised learning and annotated data. Acquiring such data is often a challenge, particularly in specialized fields like medical, legal, and financial sectors. Those are commonly referred to as low-resource domains, which comprise long-tail entities, due to the scarcity of available data. To address this, data… ▽ More Named Entity Recognition (NER) is a machine learning task that traditionally relies on supervised learning and annotated data. Acquiring such data is often a challenge, particularly in specialized fields like medical, legal, and financial sectors. Those are commonly referred to as low-resource domains, which comprise long-tail entities, due to the scarcity of available data. To address this, data augmentation techniques are increasingly being employed to generate additional training instances from the original dataset. In this study, we evaluate the effectiveness of two prominent text augmentation techniques, Mention Replacement and Contextual Word Replacement, on two widely-used NER models, Bi-LSTM+CRF and BERT. We conduct experiments on four datasets from low-resource domains, and we explore the impact of various combinations of training subset sizes and number of augmented examples. We not only confirm that data augmentation is particularly beneficial for smaller datasets, but we also demonstrate that there is no universally optimal number of augmented examples, i.e., NER practitioners must experiment with different quantities in order to fine-tune their projects. △ Less

Submitted 21 November, 2024; originally announced November 2024.

Comments: 21 pages, 2 figures

arXiv:2405.05129 [pdf, other]

Web Intelligence Journal in perspective: an analysis of its two decades trajectory

Authors: Diogenes Ademir Domingos, Victor Emanuel Santos Moura, Antonio Fernando Lavareda Jacob Junior, Fabio Manoel Franca Lobato

Abstract: The evolution of a thematic area undergoes various changes of perspective and adopts new theoretical approaches that arise from the interactions of the community and a wide range of social needs. The advent of digital technologies, such as social networks, underlines this factor by spreading knowledge and forging links between different communities. Web intelligence is now on the verge of raising… ▽ More The evolution of a thematic area undergoes various changes of perspective and adopts new theoretical approaches that arise from the interactions of the community and a wide range of social needs. The advent of digital technologies, such as social networks, underlines this factor by spreading knowledge and forging links between different communities. Web intelligence is now on the verge of raising questions that broaden the understanding of how artificial intelligence impacts the Web of People, Data, and Things, among other factors. To the best of our knowledge, there is no study that has conducted a longitudinal analysis of the evolution of this community. Thus, we investigate in this paper how Web intelligence has evolved in the last twenty years by carrying out a literature review and bibliometric analysis. Concerning the impact of this research study, increasing attention is devoted to determining which are the most influential papers in the community by referring to citation networks and discovering the most popular and pressing topics through a co-citation analysis and the keywords co-occurrence. The results obtained can guide the direction of new research projects in the area and update the scope and places of interest found in current trends and the relevant journals. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2203.05921 [pdf, other]

Supporting Schema References in Keyword Queries over Relational Databases

Authors: Paulo Martins, Altigran da Silva, João Cavalcanti, Edleno de Moura

Abstract: Relational Keyword Search (R-KwS) systems enable naive/informal users to explore and retrieve information from relational databases without knowing schema details or query languages. These systems take the keywords from the input query, locate the elements of the target database that correspond to these keywords, and look for ways to "connect" these elements using information on referential integr… ▽ More Relational Keyword Search (R-KwS) systems enable naive/informal users to explore and retrieve information from relational databases without knowing schema details or query languages. These systems take the keywords from the input query, locate the elements of the target database that correspond to these keywords, and look for ways to "connect" these elements using information on referential integrity constraints, i.e., key/foreign key pairs. Although several such systems have been proposed in the literature, most of them only support queries whose keywords refer to the contents of the target database and just very few support queries in which keywords refer to elements of the database schema. This paper proposes LATHE, a novel R-KwS designed to support such queries. To this end, in our work, we first generalize the well-known concepts of Query Matches (QMs) and Candidate Joining Networks (CJNs) to handle keywords referring to schema elements and propose new algorithms to generate them. Then, we introduce an approach to automatically select the CJNs that are more likely to represent the user intent when issuing a keyword query. This approach includes two major innovations: a ranking algorithm for selecting better QMs, yielding the generation of fewer but better CJNs, and an eager evaluation strategy for pruning void useless CJNs. We present a comprehensive set of experiments performed with query sets and datasets previously used in experiments with state-of-the-art R-KwS systems and methods. Our results indicate that LATHE can handle a wider variety of keyword queries while remaining highly effective, even for large databases with intricate schemas. △ Less

Submitted 11 March, 2022; originally announced March 2022.

ACM Class: H.2; H.3.3

arXiv:2004.09741 [pdf]

doi 10.1016/j.infsof.2020.106294

On the Performance of Hybrid Search Strategies for Systematic Literature Reviews in Software Engineering

Authors: Erica Mourão, João Felipe Pimentel, Leonardo Murta, Marcos Kalinowski, Emilia Mendes, Claes Wohlin

Abstract: Context: When conducting a Systematic Literature Review (SLR), researchers usually face the challenge of designing a search strategy that appropriately balances result quality and review effort. Using digital library (or database) searches or snowballing alone may not be enough to achieve high-quality results. On the other hand, using both digital library searches and snowballing together may incr… ▽ More Context: When conducting a Systematic Literature Review (SLR), researchers usually face the challenge of designing a search strategy that appropriately balances result quality and review effort. Using digital library (or database) searches or snowballing alone may not be enough to achieve high-quality results. On the other hand, using both digital library searches and snowballing together may increase the overall review effort. Objective: The goal of this research is to propose and evaluate hybrid search strategies that selectively combine database searches with snowballing. Method: We propose four hybrid search strategies combining database searches in digital libraries with iterative, parallel, or sequential backward and forward snowballing. We simulated the strategies over three existing SLRs in SE that adopted both database searches and snowballing. We compared the outcome of digital library searches, snowballing, and hybrid strategies using precision, recall, and F-measure to investigate the performance of each strategy. Results: Our results show that, for the analyzed SLRs, combining database searches from the Scopus digital library with parallel or sequential snowballing achieved the most appropriate balance of precision and recall. Conclusion: We put forward that, depending on the goals of the SLR and the available resources, using a hybrid search strategy involving a representative digital library and parallel or sequential snowballing tends to represent an appropriate alternative to be used when searching for evidence in SLRs. △ Less

Submitted 20 April, 2020; originally announced April 2020.

Comments: Accepted for publication at the Information and Software Technology Journal

Showing 1–6 of 6 results for author: Mourão, E