Vietnamese Legal Information Retrieval in Question-Answering System
Authors:
Thiem Nguyen Ba,
Vinh Doan The,
Tung Pham Quang,
Toan Tran Van
Abstract:
In the modern era of rapidly increasing data volumes, accurately retrieving and recommending relevant documents has become crucial in enhancing the reliability of Question Answering (QA) systems. Recently, Retrieval Augmented Generation (RAG) has gained significant recognition for enhancing the capabilities of large language models (LLMs) by mitigating hallucination issues in QA systems, which is…
▽ More
In the modern era of rapidly increasing data volumes, accurately retrieving and recommending relevant documents has become crucial in enhancing the reliability of Question Answering (QA) systems. Recently, Retrieval Augmented Generation (RAG) has gained significant recognition for enhancing the capabilities of large language models (LLMs) by mitigating hallucination issues in QA systems, which is particularly beneficial in the legal domain. Various methods, such as semantic search using dense vector embeddings or a combination of multiple techniques to improve results before feeding them to LLMs, have been proposed. However, these methods often fall short when applied to the Vietnamese language due to several challenges, namely inefficient Vietnamese data processing leading to excessive token length or overly simplistic ensemble techniques that lead to instability and limited improvement. Moreover, a critical issue often overlooked is the ordering of final relevant documents which are used as reference to ensure the accuracy of the answers provided by LLMs. In this report, we introduce our three main modifications taken to address these challenges. First, we explore various practical approaches to data processing to overcome the limitations of the embedding model. Additionally, we enhance Reciprocal Rank Fusion by normalizing order to combine results from keyword and vector searches effectively. We also meticulously re-rank the source pieces of information used by LLMs with Active Retrieval to improve user experience when refining the information generated. In our opinion, this technique can also be considered as a new re-ranking method that might be used in place of the traditional cross encoder. Finally, we integrate these techniques into a comprehensive QA system, significantly improving its performance and reliability
△ Less
Submitted 4 September, 2024;
originally announced September 2024.
Analytic Performance Evaluation of Underlay Relay Cognitive Networks with Channel Estimation Errors
Authors:
Khuong Ho-Van,
Paschalis C. Sofotasios,
Son Vo Que,
Tuan Dang Anh,
Thai Pham Quang,
Lien Pham Hong
Abstract:
This paper evaluates the bit error rate (BER) performance of underlay relay cognitive networks with decode-and-forward (DF) relays in arbitrary number of hops over Rayleigh fading with channel estimation errors. In order to facilitate the performance evaluation analytically we derive a novel exact closed-form representation for the corresponding BER which is validated through extensive comparisons…
▽ More
This paper evaluates the bit error rate (BER) performance of underlay relay cognitive networks with decode-and-forward (DF) relays in arbitrary number of hops over Rayleigh fading with channel estimation errors. In order to facilitate the performance evaluation analytically we derive a novel exact closed-form representation for the corresponding BER which is validated through extensive comparisons with results from Monte-Carlo simulations. The proposed expression involved well known elementary and special functions which render its computational realization rather simple and straightforward. As a result, the need for laborious, energy exhaustive and time-consuming computer simulations can be ultimately omitted. Numerous results illustrate that the performance of underlay relay cognitive networks is, as expected, significantly degraded by channel estimation errors and that is highly dependent upon of both the network topology and the number of hops.
△ Less
Submitted 28 December, 2016; v1 submitted 14 May, 2015;
originally announced May 2015.