Search | arXiv e-print repository

arXiv:2501.19364 [pdf, other]

CoSTI: Consistency Models for (a faster) Spatio-Temporal Imputation

Authors: Javier Solís-García, Belén Vega-Márquez, Juan A. Nepomuceno, Isabel A. Nepomuceno-Chamorro

Abstract: Multivariate Time Series Imputation (MTSI) is crucial for many applications, such as healthcare monitoring and traffic management, where incomplete data can compromise decision-making. Existing state-of-the-art methods, like Denoising Diffusion Probabilistic Models (DDPMs), achieve high imputation accuracy; however, they suffer from significant computational costs and are notably time-consuming du… ▽ More Multivariate Time Series Imputation (MTSI) is crucial for many applications, such as healthcare monitoring and traffic management, where incomplete data can compromise decision-making. Existing state-of-the-art methods, like Denoising Diffusion Probabilistic Models (DDPMs), achieve high imputation accuracy; however, they suffer from significant computational costs and are notably time-consuming due to their iterative nature. In this work, we propose CoSTI, an innovative adaptation of Consistency Models (CMs) for the MTSI domain. CoSTI employs Consistency Training to achieve comparable imputation quality to DDPMs while drastically reducing inference times, making it more suitable for real-time applications. We evaluate CoSTI across multiple datasets and missing data scenarios, demonstrating up to a 98% reduction in imputation time with performance on par with diffusion-based models. This work bridges the gap between efficiency and accuracy in generative imputation tasks, providing a scalable solution for handling missing data in critical spatio-temporal systems. △ Less

Submitted 31 January, 2025; originally announced January 2025.

Comments: 20 pages, 5 figures, 13 tables

arXiv:2412.17964 [pdf, other]

Dynamic Multi-Agent Orchestration and Retrieval for Multi-Source Question-Answer Systems using Large Language Models

Authors: Antony Seabra, Claudio Cavalcante, Joao Nepomuceno, Lucas Lago, Nicolaas Ruberg, Sergio Lifschitz

Abstract: We propose a methodology that combines several advanced techniques in Large Language Model (LLM) retrieval to support the development of robust, multi-source question-answer systems. This methodology is designed to integrate information from diverse data sources, including unstructured documents (PDFs) and structured databases, through a coordinated multi-agent orchestration and dynamic retrieval… ▽ More We propose a methodology that combines several advanced techniques in Large Language Model (LLM) retrieval to support the development of robust, multi-source question-answer systems. This methodology is designed to integrate information from diverse data sources, including unstructured documents (PDFs) and structured databases, through a coordinated multi-agent orchestration and dynamic retrieval approach. Our methodology leverages specialized agents-such as SQL agents, Retrieval-Augmented Generation (RAG) agents, and router agents - that dynamically select the most appropriate retrieval strategy based on the nature of each query. To further improve accuracy and contextual relevance, we employ dynamic prompt engineering, which adapts in real time to query-specific contexts. The methodology's effectiveness is demonstrated within the domain of Contract Management, where complex queries often require seamless interaction between unstructured and structured data. Our results indicate that this approach enhances response accuracy and relevance, offering a versatile and scalable framework for developing question-answer systems that can operate across various domains and data sources. △ Less

Submitted 23 December, 2024; originally announced December 2024.

Comments: International Conference on NLP, AI, Computer Science & Engineering (NLAICSE 2024)

arXiv:2412.17942 [pdf, other]

Contrato360 2.0: A Document and Database-Driven Question-Answer System using Large Language Models and Agents

Authors: Antony Seabra, Claudio Cavalcante, Joao Nepomuceno, Lucas Lago, Nicolaas Ruberg, Sergio Lifschitz

Abstract: We present a question-and-answer (Q\&A) application designed to support the contract management process by leveraging combined information from contract documents (PDFs) and data retrieved from contract management systems (database). This data is processed by a large language model (LLM) to provide precise and relevant answers. The accuracy of these responses is further enhanced through the use of… ▽ More We present a question-and-answer (Q\&A) application designed to support the contract management process by leveraging combined information from contract documents (PDFs) and data retrieved from contract management systems (database). This data is processed by a large language model (LLM) to provide precise and relevant answers. The accuracy of these responses is further enhanced through the use of Retrieval-Augmented Generation (RAG), text-to-SQL techniques, and agents that dynamically orchestrate the workflow. These techniques eliminate the need to retrain the language model. Additionally, we employed Prompt Engineering to fine-tune the focus of responses. Our findings demonstrate that this multi-agent orchestration and combination of techniques significantly improve the relevance and accuracy of the answers, offering a promising direction for future information systems. △ Less

Submitted 23 December, 2024; originally announced December 2024.

Comments: KDIR 2024 - Knowledge Discovery and Information Retrieval

arXiv:2410.05916 [pdf, other]

TIMBA: Time series Imputation with Bi-directional Mamba Blocks and Diffusion models

Authors: Javier Solís-García, Belén Vega-Márquez, Juan A. Nepomuceno, Isabel A. Nepomuceno-Chamorro

Abstract: The problem of imputing multivariate time series spans a wide range of fields, from clinical healthcare to multi-sensor systems. Initially, Recurrent Neural Networks (RNNs) were employed for this task; however, their error accumulation issues led to the adoption of Transformers, leveraging attention mechanisms to mitigate these problems. Concurrently, the promising results of diffusion models in c… ▽ More The problem of imputing multivariate time series spans a wide range of fields, from clinical healthcare to multi-sensor systems. Initially, Recurrent Neural Networks (RNNs) were employed for this task; however, their error accumulation issues led to the adoption of Transformers, leveraging attention mechanisms to mitigate these problems. Concurrently, the promising results of diffusion models in capturing original distributions have positioned them at the forefront of current research, often in conjunction with Transformers. In this paper, we propose replacing time-oriented Transformers with State-Space Models (SSM), which are better suited for temporal data modeling. Specifically, we utilize the latest SSM variant, S6, which incorporates attention-like mechanisms. By embedding S6 within Mamba blocks, we develop a model that integrates SSM, Graph Neural Networks, and node-oriented Transformers to achieve enhanced spatiotemporal representations. Implementing these architectural modifications, previously unexplored in this field, we present Time series Imputation with Bi-directional mamba blocks and diffusion models (TIMBA). TIMBA achieves superior performance in almost all benchmark scenarios and performs comparably in others across a diverse range of missing value situations and three real-world datasets. We also evaluate how the performance of our model varies with different amounts of missing values and analyse its performance on downstream tasks. In addition, we provide the original code to replicate the results. △ Less

Submitted 8 October, 2024; originally announced October 2024.

Comments: 14 pages, 7 tables and 2 figures

Showing 1–4 of 4 results for author: Nepomuceno, J