Skip to main content

Showing 1–40 of 40 results for author: Rosa, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.15756  [pdf, other

    cs.SE

    An Empirical Analysis of Vulnerability Detection Tools for Solidity Smart Contracts Using Line Level Manually Annotated Vulnerabilities

    Authors: Francesco Salzano, Cosmo Kevin Antenucci, Simone Scalabrino, Giovanni Rosa, Rocco Oliveto, Remo Pareschi

    Abstract: The rapid adoption of blockchain technology highlighted the importance of ensuring the security of smart contracts due to their critical role in automated business logic execution on blockchain platforms. This paper provides an empirical evaluation of automated vulnerability analysis tools specifically designed for Solidity smart contracts. Leveraging the extensive SmartBugs 2.0 framework, which i… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 38 pages, 3 figures

  2. arXiv:2505.05777  [pdf, ps, other

    cs.SE cs.AI

    PyResBugs: A Dataset of Residual Python Bugs for Natural Language-Driven Fault Injection

    Authors: Domenico Cotroneo, Giuseppe De Rosa, Pietro Liguori

    Abstract: This paper presents PyResBugs, a curated dataset of residual bugs, i.e., defects that persist undetected during traditional testing but later surface in production, collected from major Python frameworks. Each bug in the dataset is paired with its corresponding fault-free (fixed) version and annotated with multi-level natural language (NL) descriptions. These NL descriptions enable natural languag… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  3. arXiv:2504.21318  [pdf, other

    cs.AI cs.CL

    Phi-4-reasoning Technical Report

    Authors: Marah Abdin, Sahaj Agarwal, Ahmed Awadallah, Vidhisha Balachandran, Harkirat Behl, Lingjiao Chen, Gustavo de Rosa, Suriya Gunasekar, Mojan Javaheripi, Neel Joshi, Piero Kauffmann, Yash Lara, Caio César Teodoro Mendes, Arindam Mitra, Besmira Nushi, Dimitris Papailiopoulos, Olli Saarikivi, Shital Shah, Vaishnavi Shrivastava, Vibhav Vineet, Yue Wu, Safoora Yousefi, Guoqing Zheng

    Abstract: We introduce Phi-4-reasoning, a 14-billion parameter reasoning model that achieves strong performance on complex reasoning tasks. Trained via supervised fine-tuning of Phi-4 on carefully curated set of "teachable" prompts-selected for the right level of complexity and diversity-and reasoning demonstrations generated using o3-mini, Phi-4-reasoning generates detailed reasoning chains that effectivel… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  4. arXiv:2501.13706  [pdf

    cs.CE physics.comp-ph

    Analysis of Eccentric Coaxial Waveguides Filled with Lossy Anisotropic Media via Finite Difference

    Authors: Raul O. Ribeiro, Maria A. Martinez, Guilherme S. Rosa, Rafael A. Penchel

    Abstract: This study presents a finite difference method (FDM) to model the electromagnetic field propagation in eccentric coaxial waveguides filled with lossy uniaxially anisotropic media. The formulation utilizes conformal transformation to map the eccentric circular waveguide into an equivalent concentric one. In the concentric problem, we introduce a novel normalized Helmholtz equation to decouple TM an… ▽ More

    Submitted 23 January, 2025; originally announced January 2025.

    Comments: This work was presented at the SBMO 2024 - XXI Brazilian Symposium on Microwaves and Optoelectronics. For more information about the conference, please visit https://www.sbmo.org.br/sbmo/2024/home

  5. arXiv:2412.15743  [pdf

    eess.SP cs.NI

    Coexistence Options and Performance Analysis of 100 Gbit/s Coherent PON in Brownfield DWDM Networks

    Authors: Gabriele Di Rosa, Martin Kuipers, Jim Zou, Ognjen Jovanovic, Jörg-Peter Elbers

    Abstract: We study system architectures for the coexistence of future coherent PON and DWDM networks. Considering deployed optical filters, we observe filtering penalties < 1dB at a laser frequency accuracy < 12GHz when using a cost-effective architecture.

    Submitted 20 December, 2024; originally announced December 2024.

    Comments: This work has been partially funded by the German Federal Ministry of Economics and Climate Action in the project Helios-KT (16IPCEI201)

  6. arXiv:2412.08905  [pdf, other

    cs.CL cs.AI

    Phi-4 Technical Report

    Authors: Marah Abdin, Jyoti Aneja, Harkirat Behl, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, Michael Harrison, Russell J. Hewett, Mojan Javaheripi, Piero Kauffmann, James R. Lee, Yin Tat Lee, Yuanzhi Li, Weishung Liu, Caio C. T. Mendes, Anh Nguyen, Eric Price, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Xin Wang, Rachel Ward, Yue Wu, Dingli Yu , et al. (2 additional authors not shown)

    Abstract: We present phi-4, a 14-billion parameter language model developed with a training recipe that is centrally focused on data quality. Unlike most language models, where pre-training is based primarily on organic data sources such as web content or code, phi-4 strategically incorporates synthetic data throughout the training process. While previous models in the Phi family largely distill the capabil… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

  7. arXiv:2408.13719  [pdf, other

    cs.AI

    Count-based Novelty Exploration in Classical Planning

    Authors: Giacomo Rosa, Nir Lipovetzky

    Abstract: Count-based exploration methods are widely employed to improve the exploratory behavior of learning agents over sequential decision problems. Meanwhile, Novelty search has achieved success in Classical Planning through recording of the first, but not successive, occurrences of tuples. In order to structure the exploration, however, the number of tuples considered needs to grow exponentially as the… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: Extended version of paper accepted for publication at ECAI 2024

  8. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Qin Cai, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Weizhu Chen, Yen-Chun Chen, Yi-Ling Chen, Hao Cheng, Parul Chopra, Xiyang Dai , et al. (104 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. Our training dataset is a scaled-up version… ▽ More

    Submitted 30 August, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 24 pages

  9. arXiv:2312.14676  [pdf, other

    cs.NI

    Capitalizing on Next-Generation Optical Communication Systems with Proactive Multi-Period Network Planning

    Authors: Jasper Müller, Sai Kireet Patri, Gabriele Di Rosa, Achim Autenrieth, Jörg-Peter Elbers, Carmen Mas-Machuca

    Abstract: Optical transport network operators typically follow a pay-as-you-grow strategy for their network deployment. We propose a proactive multi-period planning approach based on heuristic network planning, supporting this deployment strategy while enabling efficient network utilization through next-generation technology. We report 60% less provisioned lightpaths.

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: The work has been partially funded by the German Federal Ministry of Education and Research in the project STARFALL (16KIS1418K)

    Journal ref: European Conference on Optical Communications (ECOC) 2023

  10. arXiv:2312.11005  [pdf, other

    cs.NI eess.SP

    On the Benefits of Rate-Adaptive Transceivers: A Network Planning Study

    Authors: Jasper Müller, Gabriele Di Rosa, Tobias Fehenberger, Mario Wenning, Sai Kireet Patri, Jörg-Peter Elbers, Carmen Mas-Machuca

    Abstract: Flexible-grid Elastic Optical Networks (EONs) have been widely deployed in recent years to support the growing demand for bandwidth-intensive applications. To address this cost-efficiently, optimized utilization of EONs is required. Next-generation bandwidth-variable transceivers (BVTs) will offer increased adaptivity in symbol rate as well as modulation through probabilistic constellation shaping… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: Copyright 2023 IEEE. This work has been partially funded in the framework of the CELTIC-NEXT project AI-NET-PROTECT (Project ID C2019/3-4) (#16KIS1279K) and in the programme "Souverän. Digital. Vernetzt." joint project 6G-life (#16KISK002) by the German Federal Ministry of Education and Research

  11. arXiv:2306.11644  [pdf, other

    cs.CL cs.AI cs.LG

    Textbooks Are All You Need

    Authors: Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, Yuanzhi Li

    Abstract: We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbook quality" data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains pass@1 accu… ▽ More

    Submitted 2 October, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: 26 pages; changed color scheme of plot. fixed minor typos and added couple clarifications

  12. Multi-Wavelength Transponders for High-capacity Optical Networks: A Physical-layer-aware Network Planning Study

    Authors: Jasper Müller, Ognjen Jovanovic, Tobias Fehenberger, Gabriele Di Rosa, Jörg-Peter Elbers, Carmen Mas-Machuca

    Abstract: Continued cost- and power-efficient capacity scaling in optical networks is imperative to keep pace with ever-increasing traffic demands. In this paper, we investigate multi-wavelength transponders as a potential way forward. Suitable system architectures and realistic specifications of multi-wavelength transponders are identified and analyzed in terms of transmit OSNR penalties and spectral const… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    Comments: The work has been partially funded by the German Federal Ministry of Education and Research in the project STARFALL (16KIS1418K) and the programme of "Souverän. Digital. Vernetzt." joint project 6G-life (16KISK002), as well as the European Research Council through the ERC-CoG FRECOM project (grant no. 771878)

    Journal ref: Journal of Optical Communications and Networking (JOCN) 2023

  13. arXiv:2303.15990  [pdf, other

    cs.SE

    Automatically Generating Dockerfiles via Deep Learning: Challenges and Promises

    Authors: Giovanni Rosa, Antonio Mastropaolo, Simone Scalabrino, Gabriele Bavota, Rocco Oliveto

    Abstract: Containerization allows developers to define the execution environment in which their software needs to be installed. Docker is the leading platform in this field, and developers that use it are required to write a Dockerfile for their software. Writing Dockerfiles is far from trivial, especially when the system has unusual requirements for its execution environment. Despite several tools exist to… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

  14. Enhancing Hyper-To-Real Space Projections Through Euclidean Norm Meta-Heuristic Optimization

    Authors: Luiz C. F. Ribeiro, Mateus Roder, Gustavo H. de Rosa, Leandro A. Passos, João P. Papa

    Abstract: The continuous computational power growth in the last decades has made solving several optimization problems significant to humankind a tractable task; however, tackling some of them remains a challenge due to the overwhelming amount of candidate solutions to be evaluated, even by using sophisticated algorithms. In such a context, a set of nature-inspired stochastic methods, called meta-heuristic… ▽ More

    Submitted 31 January, 2023; originally announced January 2023.

  15. A survey on text generation using generative adversarial networks

    Authors: Gustavo Henrique de Rosa, João Paulo Papa

    Abstract: This work presents a thorough review concerning recent studies and text generation advancements using Generative Adversarial Networks. The usage of adversarial learning for text generation is promising as it provides alternatives to generate the so-called "natural" language. Nevertheless, adversarial text generation is not a simple task as its foremost architecture, the Generative Adversarial Netw… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Journal ref: Pattern Recognition 119 (2021): 108098

  16. arXiv:2212.09447  [pdf, other

    cs.AI

    Improving Pre-Trained Weights Through Meta-Heuristics Fine-Tuning

    Authors: Gustavo H. de Rosa, Mateus Roder, João Paulo Papa, Claudio F. G. dos Santos

    Abstract: Machine Learning algorithms have been extensively researched throughout the last decade, leading to unprecedented advances in a broad range of applications, such as image classification and reconstruction, object recognition, and text categorization. Nonetheless, most Machine Learning algorithms are trained via derivative-based optimizers, such as the Stochastic Gradient Descent, leading to possib… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

  17. arXiv:2212.06121  [pdf, other

    cs.IR cs.CL

    In Defense of Cross-Encoders for Zero-Shot Retrieval

    Authors: Guilherme Rosa, Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, Rodrigo Nogueira

    Abstract: Bi-encoders and cross-encoders are widely used in many state-of-the-art retrieval pipelines. In this work we study the generalization ability of these two types of architectures on a wide range of parameter count on both in-domain and out-of-domain scenarios. We find that the number of parameters and early query-document interactions of cross-encoders play a significant role in the generalization… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2206.02873

  18. From Actions to Events: A Transfer Learning Approach Using Improved Deep Belief Networks

    Authors: Mateus Roder, Jurandy Almeida, Gustavo H. de Rosa, Leandro A. Passos, André L. D. Rossi, João P. Papa

    Abstract: In the last decade, exponential data growth supplied machine learning-based algorithms' capacity and enabled their usage in daily-life activities. Additionally, such an improvement is partially explained due to the advent of deep learning techniques, i.e., stacks of simple architectures that end up in more complex models. Although both factors produce outstanding results, they also pose drawbacks… ▽ More

    Submitted 30 November, 2022; originally announced November 2022.

  19. arXiv:2210.03251  [pdf, other

    cs.CL

    Small Character Models Match Large Word Models for Autocomplete Under Memory Constraints

    Authors: Ganesh Jawahar, Subhabrata Mukherjee, Debadeepta Dey, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Caio Cesar Teodoro Mendes, Gustavo Henrique de Rosa, Shital Shah

    Abstract: Autocomplete is a task where the user inputs a piece of text, termed prompt, which is conditioned by the model to generate semantically coherent continuation. Existing works for this task have primarily focused on datasets (e.g., email, chat) with high frequency user prompt patterns (or focused prompts) where word-based language models have been quite effective. In this work, we study the more cha… ▽ More

    Submitted 7 June, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: SustaiNLP 2023

  20. arXiv:2208.09097  [pdf, other

    cs.SE

    Fixing Dockerfile Smells: An Empirical Study

    Authors: Giovanni Rosa, Simone Scalabrino, Rocco Oliveto

    Abstract: Background. Containerization technologies are widely adopted in the DevOps workflow. The most commonly used one is Docker, which requires developers to define a specification file (Dockerfile) to build the image used for creating containers. There are several best practice rules for writing Dockerfiles, but the developers do not always follow them. Violations of such practices, known as Dockerfile… ▽ More

    Submitted 18 August, 2022; originally announced August 2022.

    Comments: Accepted at ICSME 2022 Registered Reports Track

  21. arXiv:2208.06264  [pdf, ps, other

    cs.IR

    A Boring-yet-effective Approach for the Product Ranking Task of the Amazon KDD Cup 2022

    Authors: Vitor Jeronymo, Guilherme Rosa, Surya Kallumadi, Roberto Lotufo, Rodrigo Nogueira

    Abstract: In this work we describe our submission to the product ranking task of the Amazon KDD Cup 2022. We rely on a receipt that showed to be effective in previous competitions: we focus our efforts towards efficiently training and deploying large language odels, such as mT5, while reducing to a minimum the number of task-specific adaptations. Despite the simplicity of our approach, our best model was le… ▽ More

    Submitted 9 August, 2022; originally announced August 2022.

  22. arXiv:2206.02873  [pdf, other

    cs.IR cs.CL cs.PF

    No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval

    Authors: Guilherme Moraes Rosa, Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, Rodrigo Nogueira

    Abstract: Recent work has shown that small distilled language models are strong competitors to models that are orders of magnitude larger and slower in a wide range of information retrieval tasks. This has made distilled and dense models, due to latency constraints, the go-to choice for deployment in real-world retrieval applications. In this work, we question this practice by showing that the number of par… ▽ More

    Submitted 12 December, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

  23. arXiv:2205.15172  [pdf, ps, other

    cs.CL

    Billions of Parameters Are Worth More Than In-domain Training Data: A case study in the Legal Case Entailment Task

    Authors: Guilherme Moraes Rosa, Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Roberto Lotufo, Rodrigo Nogueira

    Abstract: Recent work has shown that language models scaled to billions of parameters, such as GPT-3, perform remarkably well in zero-shot and few-shot scenarios. In this work, we experiment with zero-shot models in the legal case entailment task of the COLIEE 2022 competition. Our experiments show that scaling the number of parameters in a language model improves the F1 score of our previous zero-shot resu… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

  24. arXiv:2203.13856  [pdf, other

    eess.IV cs.CV cs.LG

    Robust deep learning for eye fundus images: Bridging real and synthetic data for enhancing generalization

    Authors: Guilherme C. Oliveira, Gustavo H. Rosa, Daniel C. G. Pedronette, João P. Papa, Himeesh Kumar, Leandro A. Passos, Dinesh Kumar

    Abstract: Deep learning applications for assessing medical images are limited because the datasets are often small and imbalanced. The use of synthetic data has been proposed in the literature, but neither a robust comparison of the different methods nor generalizability has been reported. Our approach integrates a retinal image quality assessment model and StyleGAN2 architecture to enhance Age-related Macu… ▽ More

    Submitted 3 April, 2024; v1 submitted 25 March, 2022; originally announced March 2022.

    Comments: Accepted by the Biomedical Signal Processing and Control

    Journal ref: Biomedical Signal Processing and Control, 94 (2024), 106263

  25. arXiv:2203.02094  [pdf, other

    cs.LG cs.CL

    LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models

    Authors: Mojan Javaheripi, Gustavo H. de Rosa, Subhabrata Mukherjee, Shital Shah, Tomasz L. Religa, Caio C. T. Mendes, Sebastien Bubeck, Farinaz Koushanfar, Debadeepta Dey

    Abstract: The Transformer architecture is ubiquitously used as the building block of large-scale autoregressive language models. However, finding architectures with the optimal trade-off between task performance (perplexity) and hardware constraints like peak memory utilization and latency is non-trivial. This is exacerbated by the proliferation of various hardware. We leverage the somewhat surprising empir… ▽ More

    Submitted 17 October, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

  26. arXiv:2202.03854  [pdf, ps, other

    cs.LG cs.AI

    Comparative Study Between Distance Measures On Supervised Optimum-Path Forest Classification

    Authors: Gustavo Henrique de Rosa, Mateus Roder, João Paulo Papa

    Abstract: Machine Learning has attracted considerable attention throughout the past decade due to its potential to solve far-reaching tasks, such as image classification, object recognition, anomaly detection, and data forecasting. A standard approach to tackle such applications is based on supervised learning, which is assisted by large sets of labeled data and is conducted by the so-called classifiers, su… ▽ More

    Submitted 8 February, 2022; originally announced February 2022.

    Comments: 16 pages, 2 figures

    MSC Class: 68T01 ACM Class: I.2.0

  27. To Tune or Not To Tune? Zero-shot Models for Legal Case Entailment

    Authors: Guilherme Moraes Rosa, Ruan Chaves Rodrigues, Roberto de Alencar Lotufo, Rodrigo Nogueira

    Abstract: There has been mounting evidence that pretrained language models fine-tuned on large and diverse supervised datasets can transfer well to a variety of out-of-domain tasks. In this work, we investigate this transfer ability to the legal domain. For that, we participated in the legal case entailment task of COLIEE 2021, in which we use such models with no adaptations to the target domain. Our submis… ▽ More

    Submitted 7 February, 2022; originally announced February 2022.

  28. arXiv:2201.05658  [pdf, other

    cs.AI cs.CL

    Sequence-to-Sequence Models for Extracting Information from Registration and Legal Documents

    Authors: Ramon Pires, Fábio C. de Souza, Guilherme Rosa, Roberto A. Lotufo, Rodrigo Nogueira

    Abstract: A typical information extraction pipeline consists of token- or span-level classification models coupled with a series of pre- and post-processing scripts. In a production pipeline, requirements often change, with classes being added and removed, which leads to nontrivial modifications to the source code and the possible introduction of bugs. In this work, we evaluate sequence-to-sequence models a… ▽ More

    Submitted 14 January, 2022; originally announced January 2022.

  29. arXiv:2106.11828  [pdf, ps, other

    cs.LG cs.AI

    Speeding Up OPFython with Numba

    Authors: Gustavo H. de Rosa, João Paulo Papa

    Abstract: A graph-inspired classifier, known as Optimum-Path Forest (OPF), has proven to be a state-of-the-art algorithm comparable to Logistic Regressors, Support Vector Machines in a wide variety of tasks. Recently, its Python-based version, denoted as OPFython, has been proposed to provide a more friendly framework and a faster prototyping environment. Nevertheless, Python-based algorithms are slower tha… ▽ More

    Submitted 22 June, 2021; originally announced June 2021.

    Comments: 12 pages, 1 figure

    MSC Class: 68T10 ACM Class: I.2.1; I.2.5

  30. arXiv:2105.06813  [pdf, other

    cs.CL cs.IR cs.LG

    A cost-benefit analysis of cross-lingual transfer methods

    Authors: Guilherme Moraes Rosa, Luiz Henrique Bonifacio, Leandro Rodrigues de Souza, Roberto Lotufo, Rodrigo Nogueira

    Abstract: An effective method for cross-lingual transfer is to fine-tune a bilingual or multilingual model on a supervised dataset in one language and evaluating it on another language in a zero-shot manner. Translating examples at training time or inference time are also viable alternatives. However, there are costs associated with these methods that are rarely addressed in the literature. In this work, we… ▽ More

    Submitted 14 December, 2021; v1 submitted 14 May, 2021; originally announced May 2021.

  31. arXiv:2105.05686  [pdf, ps, other

    cs.IR cs.CL cs.LG

    Yes, BM25 is a Strong Baseline for Legal Case Retrieval

    Authors: Guilherme Moraes Rosa, Ruan Chaves Rodrigues, Roberto Lotufo, Rodrigo Nogueira

    Abstract: We describe our single submission to task 1 of COLIEE 2021. Our vanilla BM25 got second place, well above the median of submissions. Code is available at https://github.com/neuralmind-ai/coliee.

    Submitted 25 October, 2021; v1 submitted 26 April, 2021; originally announced May 2021.

  32. arXiv:2102.03300  [pdf, other

    cs.SE

    Evaluating SZZ Implementations Through a Developer-informed Oracle

    Authors: Giovanni Rosa, Luca Pascarella, Simone Scalabrino, Rosalia Tufano, Gabriele Bavota, Michele Lanza, Rocco Oliveto

    Abstract: The SZZ algorithm for identifying bug-inducing changes has been widely used to evaluate defect prediction techniques and to empirically investigate when, how, and by whom bugs are introduced. Over the years, researchers have proposed several heuristics to improve the SZZ accuracy, providing various implementations of SZZ. However, fairly evaluating those implementations on a reliable oracle is an… ▽ More

    Submitted 5 February, 2021; originally announced February 2021.

    Comments: Accepted to the 43rd International Conference on Software Engineering (ICSE 2021)

  33. Energy-based Dropout in Restricted Boltzmann Machines: Why not go random

    Authors: Mateus Roder, Gustavo H. de Rosa, Victor Hugo C. de Albuquerque, André L. D. Rossi, João P. Papa

    Abstract: Deep learning architectures have been widely fostered throughout the last years, being used in a wide range of applications, such as object recognition, image reconstruction, and signal processing. Nevertheless, such models suffer from a common problem known as overfitting, which limits the network from predicting unseen data effectively. Regularization approaches arise in an attempt to address su… ▽ More

    Submitted 17 January, 2021; originally announced January 2021.

  34. A Nature-Inspired Feature Selection Approach based on Hypercomplex Information

    Authors: Gustavo H. de Rosa, João Paulo Papa, Xin-She Yang

    Abstract: Feature selection for a given model can be transformed into an optimization task. The essential idea behind it is to find the most suitable subset of features according to some criterion. Nature-inspired optimization can mitigate this problem by producing compelling yet straightforward solutions when dealing with complicated fitness functions. Additionally, new mathematical representations, such a… ▽ More

    Submitted 14 January, 2021; originally announced January 2021.

    Comments: 17 pages, 7 figures

    ACM Class: I.2.0

    Journal ref: APPLIED SOFT COMPUTING; v. 94, SEP 2020

  35. arXiv:2101.01042  [pdf, ps, other

    cs.LG cs.CV

    Fast Ensemble Learning Using Adversarially-Generated Restricted Boltzmann Machines

    Authors: Gustavo H. de Rosa, Mateus Roder, João P. Papa

    Abstract: Machine Learning has been applied in a wide range of tasks throughout the last years, ranging from image classification to autonomous driving and natural language processing. Restricted Boltzmann Machine (RBM) has received recent attention and relies on an energy-based structure to model data probability distributions. Notwithstanding, such a technique is susceptible to adversarial manipulation, i… ▽ More

    Submitted 4 January, 2021; originally announced January 2021.

    Comments: 26 pages, 7 figures

    ACM Class: I.2.0

  36. arXiv:2004.03456  [pdf, other

    cs.LG eess.SP stat.ML

    Binary and Multiclass Classifiers based on Multitaper Spectral Features for Epilepsy Detection

    Authors: Jefferson Tales Oliva, João Luís Garcia Rosa

    Abstract: Epilepsy is one of the most common neurological disorders that can be diagnosed through electroencephalogram (EEG), in which the following epileptic events can be observed: pre-ictal, ictal, post-ictal, and interictal. In this paper, we present a novel method for epilepsy detection into two differentiation contexts: binary and multiclass classification. For feature extraction, a total of 105 measu… ▽ More

    Submitted 2 April, 2020; originally announced April 2020.

    Comments: 19 pages, 6 figures, 10 tables. Obs.: in the text, English editing is required. A new version of this text will be available once we have completed their review

  37. arXiv:2003.07443  [pdf, other

    cs.LG cs.CV stat.ML

    Learnergy: Energy-based Machine Learners

    Authors: Mateus Roder, Gustavo Henrique de Rosa, João Paulo Papa

    Abstract: Throughout the last years, machine learning techniques have been broadly encouraged in the context of deep learning architectures. An exciting algorithm denoted as Restricted Boltzmann Machine relies on energy- and probabilistic-based nature to tackle the most diverse applications, such as classification, reconstruction, and generation of images and signals. Nevertheless, one can see they are not… ▽ More

    Submitted 23 September, 2020; v1 submitted 16 March, 2020; originally announced March 2020.

    Comments: 12 pages, 12 figures

    MSC Class: 68T07 ACM Class: I.2.0; I.5.0

  38. arXiv:2001.10420  [pdf, other

    cs.LG cs.CV stat.ML

    OPFython: A Python-Inspired Optimum-Path Forest Classifier

    Authors: Gustavo Henrique de Rosa, João Paulo Papa, Alexandre Xavier Falcão

    Abstract: Machine learning techniques have been paramount throughout the last years, being applied in a wide range of tasks, such as classification, object recognition, person identification, and image segmentation. Nevertheless, conventional classification algorithms, e.g., Logistic Regression, Decision Trees, and Bayesian classifiers, might lack complexity and diversity, not suitable when dealing with rea… ▽ More

    Submitted 30 July, 2021; v1 submitted 28 January, 2020; originally announced January 2020.

    Comments: 14 pages, 11 figures

    MSC Class: 68T01 ACM Class: I.2.0; I.5.0

    Journal ref: Software Impacts, 1 (2021), Article 100113

  39. arXiv:1912.13002  [pdf, other

    cs.NE

    Opytimizer: A Nature-Inspired Python Optimizer

    Authors: Gustavo H. de Rosa, Douglas Rodrigues, João P. Papa

    Abstract: Optimization aims at selecting a feasible set of parameters in an attempt to solve a particular problem, being applied in a wide range of applications, such as operations research, machine learning fine-tuning, and control engineering, among others. Nevertheless, traditional iterative optimization methods use the evaluation of gradients and Hessians to find their solutions, not being practical due… ▽ More

    Submitted 2 December, 2020; v1 submitted 30 December, 2019; originally announced December 2019.

  40. arXiv:1704.05174  [pdf, ps, other

    cs.NE

    LibOPT: An Open-Source Platform for Fast Prototyping Soft Optimization Techniques

    Authors: Joao Paulo Papa, Gustavo Henrique Rosa, Douglas Rodrigues, Xin-She Yang

    Abstract: Optimization techniques play an important role in several scientific and real-world applications, thus becoming of great interest for the community. As a consequence, a number of open-source libraries are available in the literature, which ends up fostering the research and development of new techniques and applications. In this work, we present a new library for the implementation and fast protot… ▽ More

    Submitted 17 April, 2017; originally announced April 2017.