Skip to main content

Showing 1–14 of 14 results for author: Batista, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.12621  [pdf, ps, other

    cs.CL cs.IR

    Think Before You Attribute: Improving the Performance of LLMs Attribution Systems

    Authors: João Eduardo Batista, Emil Vatai, Mohamed Wahib

    Abstract: Large Language Models (LLMs) are increasingly applied in various science domains, yet their broader adoption remains constrained by a critical challenge: the lack of trustworthy, verifiable outputs. Current LLMs often generate answers without reliable source attribution, or worse, with incorrect attributions, posing a barrier to their use in scientific and high-stakes settings, where traceability… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

    Comments: 22 pages (9 pages of content, 4 pages of references, 9 pages of supplementary material), 7 figures, 10 tables

  2. arXiv:2503.21155  [pdf, other

    cs.LG

    Embedding Domain-Specific Knowledge from LLMs into the Feature Engineering Pipeline

    Authors: João Eduardo Batista

    Abstract: Feature engineering is mandatory in the machine learning pipeline to obtain robust models. While evolutionary computation is well-known for its great results both in feature selection and feature construction, its methods are computationally expensive due to the large number of evaluations required to induce the final model. Part of the reason why these algorithms require a large number of evaluat… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: 9 pages, 4 figures, 5 tables

  3. arXiv:2410.03210  [pdf, other

    cs.LG

    Tadashi: Enabling AI-Based Automated Code Generation With Guaranteed Correctness

    Authors: Emil Vatai, Aleksandr Drozd, Ivan R. Ivanov, Joao E. Batista, Yinghao Ren, Mohamed Wahib

    Abstract: Frameworks and domain-specific languages for auto-generating code have traditionally depended on human experts to implement rigorous methods ensuring the legality of code transformations. Recently, machine learning (ML) has gained traction for generating code optimized for specific hardware targets. However, ML approaches-particularly black-box neural networks-offer no guarantees on the correctnes… ▽ More

    Submitted 2 June, 2025; v1 submitted 4 October, 2024; originally announced October 2024.

    Comments: Submitted to SC25

  4. arXiv:2403.15484  [pdf, other

    cs.CL cs.LG

    RakutenAI-7B: Extending Large Language Models for Japanese

    Authors: Rakuten Group, Aaron Levine, Connie Huang, Chenguang Wang, Eduardo Batista, Ewa Szymanska, Hongyi Ding, Hou Wei Chou, Jean-François Pessiot, Johanes Effendi, Justin Chiu, Kai Torben Ohlhus, Karan Chopra, Keiji Shinzato, Koji Murakami, Lee Xiong, Lei Chen, Maki Kubota, Maksim Tkachenko, Miroku Lee, Naoki Takahashi, Prathyusha Jwalapuram, Ryutaro Tatsushima, Saurabh Jain, Sunil Kumar Yadav , et al. (5 additional authors not shown)

    Abstract: We introduce RakutenAI-7B, a suite of Japanese-oriented large language models that achieve the best performance on the Japanese LM Harness benchmarks among the open 7B models. Along with the foundation model, we release instruction- and chat-tuned models, RakutenAI-7B-instruct and RakutenAI-7B-chat respectively, under the Apache 2.0 license.

    Submitted 21 March, 2024; originally announced March 2024.

  5. arXiv:2309.14074  [pdf, other

    cs.DC

    FlexCast: genuine overlay-based atomic multicast

    Authors: Eliã Batista, Paulo Coelho, Eduardo Alchieri, Fernando Dotti, Fernando Pedone

    Abstract: Atomic multicast is a communication abstraction where messages are propagated to groups of processes with reliability and order guarantees. Atomic multicast is at the core of strongly consistent storage and transactional systems. This paper presents FlexCast, the first genuine overlay-based atomic multicast protocol. Genuineness captures the essence of atomic multicast in that only the sender of a… ▽ More

    Submitted 28 September, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

  6. arXiv:2206.14567  [pdf, other

    cs.CR cs.CY

    Contributions to Context-Aware Smart Healthcare: A Security and Privacy Perspective

    Authors: Edgar Batista

    Abstract: The management of health data, from their gathering to their analysis, arises a number of challenging issues due to their highly confidential nature. In particular, this dissertation contributes to several security and privacy challenges within the smart health paradigm. More concretely, we firstly develop some contributions to context-aware environments enabling smart health scenarios. We present… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

    Comments: Doctoral thesis

  7. arXiv:2205.12911  [pdf, other

    cs.CR cs.AI

    SoK: Cross-border Criminal Investigations and Digital Evidence

    Authors: Fran Casino, Claudia Pina, Pablo López-Aguilar, Edgar Batista, Agusti Solanas, Constantinos Patsakis

    Abstract: Digital evidence underpin the majority of crimes as their analysis is an integral part of almost every criminal investigation. Even if we temporarily disregard the numerous challenges in the collection and analysis of digital evidence, the exchange of the evidence among the different stakeholders has many thorny issues. Of specific interest are cross-border criminal investigations as the complexit… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

  8. On the Compression of Neural Networks Using $\ell_0$-Norm Regularization and Weight Pruning

    Authors: Felipe Dennis de Resende Oliveira, Eduardo Luiz Ortiz Batista, Rui Seara

    Abstract: Despite the growing availability of high-capacity computational platforms, implementation complexity still has been a great concern for the real-world deployment of neural networks. This concern is not exclusively due to the huge costs of state-of-the-art network architectures, but also due to the recent push towards edge intelligence and the use of neural networks in embedded applications. In thi… ▽ More

    Submitted 18 December, 2023; v1 submitted 10 September, 2021; originally announced September 2021.

    Comments: 34 pages, 9 figures, 7 tables

    ACM Class: I.2.6

  9. arXiv:2105.05983  [pdf, other

    cs.LG cs.AI cs.RO

    An Open-Source Tool for Classification Models in Resource-Constrained Hardware

    Authors: Lucas Tsutsui da Silva, Vinicius M. A. Souza, Gustavo E. A. P. A. Batista

    Abstract: Applications that need to sense, measure, and gather real-time information from the environment frequently face three main restrictions: power consumption, cost, and lack of infrastructure. Most of the challenges imposed by these limitations can be better addressed by embedding Machine Learning (ML) classifiers in the hardware that senses the environment, creating smart sensors able to interpret t… ▽ More

    Submitted 12 May, 2021; originally announced May 2021.

    Comments: This work has been submitted to the IEEE for possible publication

    MSC Class: 68T99 ACM Class: I.2.9

  10. arXiv:2102.04179  [pdf, other

    cs.CV cs.AI

    Plotting time: On the usage of CNNs for time series classification

    Authors: Nuno M. Rodrigues, João E. Batista, Leonardo Trujillo, Bernardo Duarte, Mario Giacobini, Leonardo Vanneschi, Sara Silva

    Abstract: We present a novel approach for time series classification where we represent time series data as plot images and feed them to a simple CNN, outperforming several state-of-the-art methods. We propose a simple and highly replicable way of plotting the time series, and feed these images as input to a non-optimized shallow CNN, without any normalization or residual connections. These representations… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

  11. Challenges in Benchmarking Stream Learning Algorithms with Real-world Data

    Authors: Vinicius M. A. Souza, Denis M. dos Reis, Andre G. Maletzke, Gustavo E. A. P. A. Batista

    Abstract: Streaming data are increasingly present in real-world applications such as sensor measurements, satellite data feed, stock market, and financial data. The main characteristics of these applications are the online arrival of data observations at high speed and the susceptibility to changes in the data distributions due to the dynamic nature of real environments. The data stream mining community sti… ▽ More

    Submitted 30 June, 2020; v1 submitted 30 April, 2020; originally announced May 2020.

    Comments: Preprint of article accepted for publication in the journal Data Mining and Knowledge Discovery

    MSC Class: 68T05 ACM Class: I.2.6

  12. arXiv:2002.00053  [pdf, other

    cs.LG eess.IV stat.ML

    Improving the Detection of Burnt Areas in Remote Sensing using Hyper-features Evolved by M3GP

    Authors: João E. Batista, Sara Silva

    Abstract: One problem found when working with satellite images is the radiometric variations across the image and different images. Intending to improve remote sensing models for the classification of burnt areas, we set two objectives. The first is to understand the relationship between feature spaces and the predictive ability of the models, allowing us to explain the differences between learning and gene… ▽ More

    Submitted 31 January, 2020; originally announced February 2020.

  13. arXiv:2001.07553  [pdf, other

    cs.NE cs.LG stat.ML

    Ensemble Genetic Programming

    Authors: Nuno M. Rodrigues, João E. Batista, Sara Silva

    Abstract: Ensemble learning is a powerful paradigm that has been usedin the top state-of-the-art machine learning methods like Random Forestsand XGBoost. Inspired by the success of such methods, we have devel-oped a new Genetic Programming method called Ensemble GP. The evo-lutionary cycle of Ensemble GP follows the same steps as other GeneticProgramming systems, but with differences in the population struc… ▽ More

    Submitted 21 January, 2020; originally announced January 2020.

    Comments: eurogp 2020 submission

  14. arXiv:1706.04109  [pdf, other

    cs.AI cs.CY

    Technical Report: Implementation and Validation of a Smart Health Application

    Authors: Fran Casino, Constantinos Patsakis, Antoni Martinez-Balleste, Frederic Borras, Edgar Batista

    Abstract: In this article, we explain in detail the internal structures and databases of a smart health application. Moreover, we describe how to generate a statistically sound synthetic dataset using real-world medical data.

    Submitted 13 June, 2017; originally announced June 2017.

    Comments: 4-page Tech Report