Search | arXiv e-print repository

doi 10.1007/978-3-031-15629-8_8

Family-Based Fingerprint Analysis: A Position Paper

Authors: Carlos Diego Nascimento Damasceno, Daniel Strüber

Abstract: Thousands of vulnerabilities are reported on a monthly basis to security repositories, such as the National Vulnerability Database. Among these vulnerabilities, software misconfiguration is one of the top 10 security risks for web applications. With this large influx of vulnerability reports, software fingerprinting has become a highly desired capability to discover distinctive and efficient signa… ▽ More Thousands of vulnerabilities are reported on a monthly basis to security repositories, such as the National Vulnerability Database. Among these vulnerabilities, software misconfiguration is one of the top 10 security risks for web applications. With this large influx of vulnerability reports, software fingerprinting has become a highly desired capability to discover distinctive and efficient signatures and recognize reportedly vulnerable software implementations. Due to the exponential worst-case complexity of fingerprint matching, designing more efficient methods for fingerprinting becomes highly desirable, especially for variability-intensive systems where optional features add another exponential factor to its analysis. This position paper presents our vision of a framework that lifts model learning and family-based analysis principles to software fingerprinting. In this framework, we propose unifying databases of signatures into a featured finite state machine and using presence conditions to specify whether and in which circumstances a given input-output trace is observed. We believe feature-based signatures can aid performance improvements by reducing the size of fingerprints under analysis. △ Less

Submitted 27 September, 2022; originally announced September 2022.

Comments: Paper published in the Proceedings A Journey from Process Algebra via Timed Automata to Model Learning: Essays Dedicated to Frits Vaandrager on the Occasion of His 60th Birthday 2022

arXiv:2207.04823 [pdf, other]

Adaptive Behavioral Model Learning for Software Product Lines

Authors: Shaghayegh Tavassoli, Carlos Diego Nascimento Damasceno, Ramtin Khosravi, Mohammad Reza Mousavi

Abstract: Behavioral models enable the analysis of the functionality of software product lines (SPL), e.g., model checking and model-based testing. Model learning aims at constructing behavioral models for software systems in some form of a finite state machine. Due to the commonalities among the products of an SPL, it is possible to reuse the previously learned models during the model learning process. In… ▽ More Behavioral models enable the analysis of the functionality of software product lines (SPL), e.g., model checking and model-based testing. Model learning aims at constructing behavioral models for software systems in some form of a finite state machine. Due to the commonalities among the products of an SPL, it is possible to reuse the previously learned models during the model learning process. In this paper, an adaptive approach (the $\text{PL}^*$ method) for learning the product models of an SPL is presented based on the well-known $L^*$ algorithm. In this method, after model learning of each product, the sequences in the final observation table are stored in a repository which will be used to initialize the observation table of the remaining products to be learned. The proposed algorithm is evaluated on two open-source SPLs and the total learning cost is measured in terms of the number of rounds, the total number of resets and input symbols. The results show that for complex SPLs, the total learning cost for the $\text{PL}^*$ method is significantly lower than that of the non-adaptive learning method in terms of all three metrics. Furthermore, it is observed that the order in which the products are learned affects the efficiency of the $\text{PL}^*$ method. Based on this observation, we introduced a heuristic to determine an ordering which reduces the total cost of adaptive learning in both case studies. △ Less

Submitted 1 August, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

Comments: 12 pages, 10 figures, Paper accepted in the Research Track of the 26th ACM International Systems and Software Product Line Conference (SPLC 2022)

arXiv:2203.05215 [pdf, other]

A Benchmark for Active Learning of Variability-Intensive Systems

Authors: Shaghayegh Tavassoli, Carlos Diego Nascimento Damasceno, Mohammad Reza Mousavi, Ramtin Khosravi

Abstract: Behavioral models are the key enablers for behavioral analysis of Software Product Lines (SPL), including testing and model checking. Active model learning comes to the rescue when family behavioral models are non-existent or outdated. A key challenge on active model learning is to detect commonalities and variability efficiently and combine them into concise family models. Benchmarks and their as… ▽ More Behavioral models are the key enablers for behavioral analysis of Software Product Lines (SPL), including testing and model checking. Active model learning comes to the rescue when family behavioral models are non-existent or outdated. A key challenge on active model learning is to detect commonalities and variability efficiently and combine them into concise family models. Benchmarks and their associated metrics will play a key role in shaping the research agenda in this promising field and provide an effective means for comparing and identifying relative strengths and weaknesses in the forthcoming techniques. In this challenge, we seek benchmarks to evaluate the efficiency (e.g., learning time and memory footprint) and effectiveness (e.g., conciseness and accuracy of family models) of active model learning methods in the software product line context. These benchmark sets must contain the structural and behavioral variability models of at least one SPL. Each SPL in a benchmark must contain products that requires more than one round of model learning with respect to the basic active learning $L^{*}$ algorithm. Alternatively, tools supporting the synthesis of artificial benchmark models are also welcome. △ Less

Submitted 10 March, 2022; originally announced March 2022.

Comments: 5 pages, 3 figures, Paper accepted in the Challenge Cases Track of the 26th ACM International Systems and Software Product Line Conference (SPLC 2022)

arXiv:2109.02304 [pdf, other]

Towards Multi-Criteria Prioritization of Best Practices in Research Artifact Sharing

Authors: Carlos Diego Nascimento Damasceno, Isotilia Costa Melo, Daniel Struber

Abstract: Research artifact sharing is known to strengthen the transparency of scientific studies. However, in the lack of common discipline-specific guidelines for artifacts evaluation, subjective and conflicting expectations may happen and threaten artifact quality. In this paper, we discuss our preliminary ideas for a framework based on quality management principles (5W2H) that can aid in the establishme… ▽ More Research artifact sharing is known to strengthen the transparency of scientific studies. However, in the lack of common discipline-specific guidelines for artifacts evaluation, subjective and conflicting expectations may happen and threaten artifact quality. In this paper, we discuss our preliminary ideas for a framework based on quality management principles (5W2H) that can aid in the establishment of common guidelines for artifact evaluation and sharing. Also, using the Analytic Hierarchy Process, we discuss how research communities could join efforts to aid the guidelines' adequacy to research priorities. These combined methodologies constitute a novelty for software engineering research which can foster research software sustainability. △ Less

Submitted 6 September, 2021; originally announced September 2021.

Comments: 5 pages, 2 figures, Emerging results paper published in the 1st Workshop on Open Science Practices for Software Engineering (OpenScienSE 2021)

arXiv:2108.04652 [pdf, other]

doi 10.1109/MODELS50736.2021.00036

Quality Guidelines for Research Artifacts in Model-Driven Engineering

Authors: Carlos Diego Nascimento Damasceno, Daniel Strüber

Abstract: Sharing research artifacts is known to help people to build upon existing knowledge, adopt novel contributions in practice, and increase the chances of papers receiving attention. In Model-Driven Engineering (MDE), openly providing research artifacts plays a key role, even more so as the community targets a broader use of AI techniques, which can only become feasible if large open datasets and con… ▽ More Sharing research artifacts is known to help people to build upon existing knowledge, adopt novel contributions in practice, and increase the chances of papers receiving attention. In Model-Driven Engineering (MDE), openly providing research artifacts plays a key role, even more so as the community targets a broader use of AI techniques, which can only become feasible if large open datasets and confidence measures for their quality are available. However, the current lack of common discipline-specific guidelines for research data sharing opens the opportunity for misunderstandings about the true potential of research artifacts and subjective expectations regarding artifact quality. To address this issue, we introduce a set of guidelines for artifact sharing specifically tailored to MDE research. To design this guidelines set, we systematically analyzed general-purpose artifact sharing practices of major computer science venues and tailored them to the MDE domain. Subsequently, we conducted an online survey with 90 researchers and practitioners with expertise in MDE. We investigated our participants' experiences in developing and sharing artifacts in MDE research and the challenges encountered while doing so. We then asked them to prioritize each of our guidelines as essential, desirable, or unnecessary. Finally, we asked them to evaluate our guidelines with respect to clarity, completeness, and relevance. In each of these dimensions, our guidelines were assessed positively by more than 92\% of the participants. To foster the reproducibility and reusability of our results, we make the full set of generated artifacts available in an open repository at \texttt{\url{https://mdeartifacts.github.io/}}. △ Less

Submitted 15 November, 2021; v1 submitted 10 August, 2021; originally announced August 2021.

Comments: 12 pages, 5 figures, 7 tables, accepted for publication at the ACM/IEEE 24th International Conference on Model Driven Engineering Languages and Systems (MODELS 2021), Foundations Track - Technical Papers

arXiv:2107.13537 [pdf]

Abordagem probabilística para análise de confiabilidade de dados gerados em sequenciamentos multiplex na plataforma ABI SOLiD

Authors: Fabio M. F. Lobato, Carlos D. N. Damasceno, Péricles L. Machado, Nandamudi L. Vijaykumar, André R. dos Santos, Sylvain H. Darnet, André N. A. Gonçalves, Dayse O. de Alencar, Ádamo L. de Santana

Abstract: The next-generation sequencers such as Illumina and SOLiD platforms generate a large amount of data, commonly above 10 Gigabytes of text files. Particularly, the SOLiD platform allows the sequencing of multiple samples in a single run, called multiplex run, through a tagging system called Barcode. This feature requires a computational process for separation of the data sample because the sequencer… ▽ More The next-generation sequencers such as Illumina and SOLiD platforms generate a large amount of data, commonly above 10 Gigabytes of text files. Particularly, the SOLiD platform allows the sequencing of multiple samples in a single run, called multiplex run, through a tagging system called Barcode. This feature requires a computational process for separation of the data sample because the sequencer provides a mixture of all samples in a single output. This process must be secure to avoid any harm that may scramble further analysis. In this context, realized the need to develop a probabilistic model capable of assigning a degree of confidence in the marking system used in multiplex sequencing. The results confirmed the adequacy of the model obtained, which allows, among other things, to guide a process of filtering the data and evaluation of the sequencing protocol used. △ Less

Submitted 11 August, 2021; v1 submitted 27 July, 2021; originally announced July 2021.

Comments: 8 pages, 4 figures, 2 tables, Published in Portuguese in the Anais of the XLIII Simpósio Brasileiro de Pesquisa Operacional (SBPO 2011), 2011. URL: http://www.din.uem.br/sbpo/sbpo2011/pdf/87903.pdf

arXiv:2107.12884 [pdf]

SimCleaner -- Sistema de Padronização de Bases de Dados utilizando Funções de Similaridade

Authors: Carlos Diego Nascimento Damasceno, Fabio Manoel França Lobato, Elton Rocha Moutinho, Arilene Santos de França, Ivan Ikikame de Oliveira, Ádamo Lima de Santana

Abstract: The Knowledge Discovery in Database (KDD) process permits the detection of pattern in databases, where this analysis may be compromised if database is not consistent, making necessary the use of data cleaning techniques. This paper presents a tool based in similarity functions to help the preprocessing of databases and it behaved efficiently in the standardization of a System of Public Security of… ▽ More The Knowledge Discovery in Database (KDD) process permits the detection of pattern in databases, where this analysis may be compromised if database is not consistent, making necessary the use of data cleaning techniques. This paper presents a tool based in similarity functions to help the preprocessing of databases and it behaved efficiently in the standardization of a System of Public Security of the State of Pará database and may be reused with other databases and other data mining projects. △ Less

Submitted 11 August, 2021; v1 submitted 27 July, 2021; originally announced July 2021.

Comments: 6 pages, 5 figures, 1 table, Published in Portuguese in the Anais da XIV Semana de Informática (SEMINF) e Escola Regional de Informática Norte (ERIN), 2011

Showing 1–7 of 7 results for author: Damasceno, C D N