-
Family-Based Fingerprint Analysis: A Position Paper
Authors:
Carlos Diego Nascimento Damasceno,
Daniel Strüber
Abstract:
Thousands of vulnerabilities are reported on a monthly basis to security repositories, such as the National Vulnerability Database. Among these vulnerabilities, software misconfiguration is one of the top 10 security risks for web applications. With this large influx of vulnerability reports, software fingerprinting has become a highly desired capability to discover distinctive and efficient signa…
▽ More
Thousands of vulnerabilities are reported on a monthly basis to security repositories, such as the National Vulnerability Database. Among these vulnerabilities, software misconfiguration is one of the top 10 security risks for web applications. With this large influx of vulnerability reports, software fingerprinting has become a highly desired capability to discover distinctive and efficient signatures and recognize reportedly vulnerable software implementations. Due to the exponential worst-case complexity of fingerprint matching, designing more efficient methods for fingerprinting becomes highly desirable, especially for variability-intensive systems where optional features add another exponential factor to its analysis. This position paper presents our vision of a framework that lifts model learning and family-based analysis principles to software fingerprinting. In this framework, we propose unifying databases of signatures into a featured finite state machine and using presence conditions to specify whether and in which circumstances a given input-output trace is observed. We believe feature-based signatures can aid performance improvements by reducing the size of fingerprints under analysis.
△ Less
Submitted 27 September, 2022;
originally announced September 2022.
-
Adaptive Behavioral Model Learning for Software Product Lines
Authors:
Shaghayegh Tavassoli,
Carlos Diego Nascimento Damasceno,
Ramtin Khosravi,
Mohammad Reza Mousavi
Abstract:
Behavioral models enable the analysis of the functionality of software product lines (SPL), e.g., model checking and model-based testing. Model learning aims at constructing behavioral models for software systems in some form of a finite state machine. Due to the commonalities among the products of an SPL, it is possible to reuse the previously learned models during the model learning process. In…
▽ More
Behavioral models enable the analysis of the functionality of software product lines (SPL), e.g., model checking and model-based testing. Model learning aims at constructing behavioral models for software systems in some form of a finite state machine. Due to the commonalities among the products of an SPL, it is possible to reuse the previously learned models during the model learning process. In this paper, an adaptive approach (the $\text{PL}^*$ method) for learning the product models of an SPL is presented based on the well-known $L^*$ algorithm. In this method, after model learning of each product, the sequences in the final observation table are stored in a repository which will be used to initialize the observation table of the remaining products to be learned. The proposed algorithm is evaluated on two open-source SPLs and the total learning cost is measured in terms of the number of rounds, the total number of resets and input symbols. The results show that for complex SPLs, the total learning cost for the $\text{PL}^*$ method is significantly lower than that of the non-adaptive learning method in terms of all three metrics. Furthermore, it is observed that the order in which the products are learned affects the efficiency of the $\text{PL}^*$ method. Based on this observation, we introduced a heuristic to determine an ordering which reduces the total cost of adaptive learning in both case studies.
△ Less
Submitted 1 August, 2022; v1 submitted 11 July, 2022;
originally announced July 2022.
-
A Benchmark for Active Learning of Variability-Intensive Systems
Authors:
Shaghayegh Tavassoli,
Carlos Diego Nascimento Damasceno,
Mohammad Reza Mousavi,
Ramtin Khosravi
Abstract:
Behavioral models are the key enablers for behavioral analysis of Software Product Lines (SPL), including testing and model checking. Active model learning comes to the rescue when family behavioral models are non-existent or outdated. A key challenge on active model learning is to detect commonalities and variability efficiently and combine them into concise family models. Benchmarks and their as…
▽ More
Behavioral models are the key enablers for behavioral analysis of Software Product Lines (SPL), including testing and model checking. Active model learning comes to the rescue when family behavioral models are non-existent or outdated. A key challenge on active model learning is to detect commonalities and variability efficiently and combine them into concise family models. Benchmarks and their associated metrics will play a key role in shaping the research agenda in this promising field and provide an effective means for comparing and identifying relative strengths and weaknesses in the forthcoming techniques. In this challenge, we seek benchmarks to evaluate the efficiency (e.g., learning time and memory footprint) and effectiveness (e.g., conciseness and accuracy of family models) of active model learning methods in the software product line context. These benchmark sets must contain the structural and behavioral variability models of at least one SPL. Each SPL in a benchmark must contain products that requires more than one round of model learning with respect to the basic active learning $L^{*}$ algorithm. Alternatively, tools supporting the synthesis of artificial benchmark models are also welcome.
△ Less
Submitted 10 March, 2022;
originally announced March 2022.
-
Towards Multi-Criteria Prioritization of Best Practices in Research Artifact Sharing
Authors:
Carlos Diego Nascimento Damasceno,
Isotilia Costa Melo,
Daniel Struber
Abstract:
Research artifact sharing is known to strengthen the transparency of scientific studies. However, in the lack of common discipline-specific guidelines for artifacts evaluation, subjective and conflicting expectations may happen and threaten artifact quality. In this paper, we discuss our preliminary ideas for a framework based on quality management principles (5W2H) that can aid in the establishme…
▽ More
Research artifact sharing is known to strengthen the transparency of scientific studies. However, in the lack of common discipline-specific guidelines for artifacts evaluation, subjective and conflicting expectations may happen and threaten artifact quality. In this paper, we discuss our preliminary ideas for a framework based on quality management principles (5W2H) that can aid in the establishment of common guidelines for artifact evaluation and sharing. Also, using the Analytic Hierarchy Process, we discuss how research communities could join efforts to aid the guidelines' adequacy to research priorities. These combined methodologies constitute a novelty for software engineering research which can foster research software sustainability.
△ Less
Submitted 6 September, 2021;
originally announced September 2021.
-
Quality Guidelines for Research Artifacts in Model-Driven Engineering
Authors:
Carlos Diego Nascimento Damasceno,
Daniel Strüber
Abstract:
Sharing research artifacts is known to help people to build upon existing knowledge, adopt novel contributions in practice, and increase the chances of papers receiving attention. In Model-Driven Engineering (MDE), openly providing research artifacts plays a key role, even more so as the community targets a broader use of AI techniques, which can only become feasible if large open datasets and con…
▽ More
Sharing research artifacts is known to help people to build upon existing knowledge, adopt novel contributions in practice, and increase the chances of papers receiving attention. In Model-Driven Engineering (MDE), openly providing research artifacts plays a key role, even more so as the community targets a broader use of AI techniques, which can only become feasible if large open datasets and confidence measures for their quality are available. However, the current lack of common discipline-specific guidelines for research data sharing opens the opportunity for misunderstandings about the true potential of research artifacts and subjective expectations regarding artifact quality. To address this issue, we introduce a set of guidelines for artifact sharing specifically tailored to MDE research. To design this guidelines set, we systematically analyzed general-purpose artifact sharing practices of major computer science venues and tailored them to the MDE domain. Subsequently, we conducted an online survey with 90 researchers and practitioners with expertise in MDE. We investigated our participants' experiences in developing and sharing artifacts in MDE research and the challenges encountered while doing so. We then asked them to prioritize each of our guidelines as essential, desirable, or unnecessary. Finally, we asked them to evaluate our guidelines with respect to clarity, completeness, and relevance. In each of these dimensions, our guidelines were assessed positively by more than 92\% of the participants. To foster the reproducibility and reusability of our results, we make the full set of generated artifacts available in an open repository at \texttt{\url{https://mdeartifacts.github.io/}}.
△ Less
Submitted 15 November, 2021; v1 submitted 10 August, 2021;
originally announced August 2021.
-
Abordagem probabilística para análise de confiabilidade de dados gerados em sequenciamentos multiplex na plataforma ABI SOLiD
Authors:
Fabio M. F. Lobato,
Carlos D. N. Damasceno,
Péricles L. Machado,
Nandamudi L. Vijaykumar,
André R. dos Santos,
Sylvain H. Darnet,
André N. A. Gonçalves,
Dayse O. de Alencar,
Ádamo L. de Santana
Abstract:
The next-generation sequencers such as Illumina and SOLiD platforms generate a large amount of data, commonly above 10 Gigabytes of text files. Particularly, the SOLiD platform allows the sequencing of multiple samples in a single run, called multiplex run, through a tagging system called Barcode. This feature requires a computational process for separation of the data sample because the sequencer…
▽ More
The next-generation sequencers such as Illumina and SOLiD platforms generate a large amount of data, commonly above 10 Gigabytes of text files. Particularly, the SOLiD platform allows the sequencing of multiple samples in a single run, called multiplex run, through a tagging system called Barcode. This feature requires a computational process for separation of the data sample because the sequencer provides a mixture of all samples in a single output. This process must be secure to avoid any harm that may scramble further analysis. In this context, realized the need to develop a probabilistic model capable of assigning a degree of confidence in the marking system used in multiplex sequencing. The results confirmed the adequacy of the model obtained, which allows, among other things, to guide a process of filtering the data and evaluation of the sequencing protocol used.
△ Less
Submitted 11 August, 2021; v1 submitted 27 July, 2021;
originally announced July 2021.
-
SimCleaner -- Sistema de Padronização de Bases de Dados utilizando Funções de Similaridade
Authors:
Carlos Diego Nascimento Damasceno,
Fabio Manoel França Lobato,
Elton Rocha Moutinho,
Arilene Santos de França,
Ivan Ikikame de Oliveira,
Ádamo Lima de Santana
Abstract:
The Knowledge Discovery in Database (KDD) process permits the detection of pattern in databases, where this analysis may be compromised if database is not consistent, making necessary the use of data cleaning techniques. This paper presents a tool based in similarity functions to help the preprocessing of databases and it behaved efficiently in the standardization of a System of Public Security of…
▽ More
The Knowledge Discovery in Database (KDD) process permits the detection of pattern in databases, where this analysis may be compromised if database is not consistent, making necessary the use of data cleaning techniques. This paper presents a tool based in similarity functions to help the preprocessing of databases and it behaved efficiently in the standardization of a System of Public Security of the State of Pará database and may be reused with other databases and other data mining projects.
△ Less
Submitted 11 August, 2021; v1 submitted 27 July, 2021;
originally announced July 2021.