Search | arXiv e-print repository

Linux Kernel Configurations at Scale: A Dataset for Performance and Evolution Analysis

Authors: Heraldo Borges, Juliana Alves Pereira, Djamel Eddine Khelladi, Mathieu Acher

Abstract: Configuring the Linux kernel to meet specific requirements, such as binary size, is highly challenging due to its immense complexity-with over 15,000 interdependent options evolving rapidly across different versions. Although several studies have explored sampling strategies and machine learning methods to understand and predict the impact of configuration options, the literature still lacks a com… ▽ More Configuring the Linux kernel to meet specific requirements, such as binary size, is highly challenging due to its immense complexity-with over 15,000 interdependent options evolving rapidly across different versions. Although several studies have explored sampling strategies and machine learning methods to understand and predict the impact of configuration options, the literature still lacks a comprehensive and large-scale dataset encompassing multiple kernel versions along with detailed quantitative measurements. To bridge this gap, we introduce LinuxData, an accessible collection of kernel configurations spanning several kernel releases, specifically from versions 4.13 to 5.8. This dataset, gathered through automated tools and build processes, comprises over 240,000 kernel configurations systematically labeled with compilation outcomes and binary sizes. By providing detailed records of configuration evolution and capturing the intricate interplay among kernel options, our dataset enables innovative research in feature subset selection, prediction models based on machine learning, and transfer learning across kernel versions. Throughout this paper, we describe how the dataset has been made easily accessible via OpenML and illustrate how it can be leveraged using only a few lines of Python code to evaluate AI-based techniques, such as supervised machine learning. We anticipate that this dataset will significantly enhance reproducibility and foster new insights into configuration-space analysis at a scale that presents unique opportunities and inherent challenges, thereby advancing our understanding of the Linux kernel's configurability and evolution. △ Less

Submitted 12 May, 2025; originally announced May 2025.

Journal ref: EASE 2025 - Evaluation and Assessment in Software Engineering, Jun 2025, Istanbul, Turkey

arXiv:2505.04670 [pdf, other]

LLM Code Customization with Visual Results: A Benchmark on TikZ

Authors: Charly Reux, Mathieu Acher, Djamel Eddine Khelladi, Olivier Barais, Clément Quinton

Abstract: With the rise of AI-based code generation, customizing existing code out of natural language instructions to modify visual results -such as figures or images -has become possible, promising to reduce the need for deep programming expertise. However, even experienced developers can struggle with this task, as it requires identifying relevant code regions (feature location), generating valid code va… ▽ More With the rise of AI-based code generation, customizing existing code out of natural language instructions to modify visual results -such as figures or images -has become possible, promising to reduce the need for deep programming expertise. However, even experienced developers can struggle with this task, as it requires identifying relevant code regions (feature location), generating valid code variants, and ensuring the modifications reliably align with user intent. In this paper, we introduce vTikZ, the first benchmark designed to evaluate the ability of Large Language Models (LLMs) to customize code while preserving coherent visual outcomes. Our benchmark consists of carefully curated vTikZ editing scenarios, parameterized ground truths, and a reviewing tool that leverages visual feedback to assess correctness. Empirical evaluation with stateof-the-art LLMs shows that existing solutions struggle to reliably modify code in alignment with visual intent, highlighting a gap in current AI-assisted code editing approaches. We argue that vTikZ opens new research directions for integrating LLMs with visual feedback mechanisms to improve code customization tasks in various domains beyond TikZ, including image processing, art creation, Web design, and 3D modeling. △ Less

Submitted 4 June, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

Journal ref: EASE 2025 - Evaluation and Assessment in Software Engineering, Jun 2025, Istanbul, Turkey. pp.1-10

arXiv:2503.16144 [pdf, other]

Unify and Triumph: Polyglot, Diverse, and Self-Consistent Generation of Unit Tests with LLMs

Authors: Djamel Eddine Khelladi, Charly Reux, Mathieu Acher

Abstract: Large language model (LLM)-based test generation has gained attention in software engineering, yet most studies evaluate LLMs' ability to generate unit tests in a single attempt for a given language, missing the opportunity to leverage LLM diversity for more robust testing. This paper introduces PolyTest, a novel approach that enhances test generation by exploiting polyglot and temperature-control… ▽ More Large language model (LLM)-based test generation has gained attention in software engineering, yet most studies evaluate LLMs' ability to generate unit tests in a single attempt for a given language, missing the opportunity to leverage LLM diversity for more robust testing. This paper introduces PolyTest, a novel approach that enhances test generation by exploiting polyglot and temperature-controlled diversity. PolyTest systematically leverages these properties in two complementary ways: (1) Cross-lingual test generation, where tests are generated in multiple languages at zero temperature and then unified; (2) Diverse test sampling, where multiple test sets are generated within the same language at a higher temperature before unification. A key insight is that LLMs can generate diverse yet contradicting tests -- same input, different expected outputs -- across languages and generations. PolyTest mitigates inconsistencies by unifying test sets, fostering self-consistency and improving overall test quality. Unlike single-language or single-attempt approaches, PolyTest enhances testing without requiring on-the-fly execution, making it particularly beneficial for weaker-performing languages. We evaluate PolyTest on Llama3-70B, GPT-4o, and GPT-3.5 using EvalPlus, generating tests in five languages (Java, C, Python, JavaScript, and a CSV-based format) at temperature 0 and sampling multiple sets at temperature 1. We observe that LLMs frequently generate contradicting tests across settings, and that PolyTest significantly improves test quality across all considered metrics -- number of tests, passing rate, statement/branch coverage (up to +9.01%), and mutation score (up to +11.23%). Finally, PolyTest outperforms Pynguin in test generation, passing rate, and mutation score. △ Less

Submitted 20 March, 2025; originally announced March 2025.

arXiv:2503.14079 [pdf, ps, other]

Testing Uniform Random Samplers: Methods, Datasets and Protocols

Authors: Olivier Zeyen, Maxime Cordy, Martin Gubri, Gilles Perrouin, Mathieu Acher

Abstract: Boolean formulae compactly encode huge, constrained search spaces. Thus, variability-intensive systems are often encoded with Boolean formulae. The search space of a variability-intensive system is usually too large to explore without statistical inference (e.g. testing). Testing every valid configuration is computationally expensive (if not impossible) for most systems. This leads most testing ap… ▽ More Boolean formulae compactly encode huge, constrained search spaces. Thus, variability-intensive systems are often encoded with Boolean formulae. The search space of a variability-intensive system is usually too large to explore without statistical inference (e.g. testing). Testing every valid configuration is computationally expensive (if not impossible) for most systems. This leads most testing approaches to sample a few configurations before analyzing them. A desirable property of such samples is uniformity: Each solution should have the same selection probability. Uniformity is the property that facilitates statistical inference. This property motivated the design of uniform random samplers, relying on SAT solvers and counters and achieving different trade-offs between uniformity and scalability. Though we can observe their performance in practice, judging the quality of the generated samples is different. Assessing the uniformity of a sampler is similar in nature to assessing the uniformity of a pseudo-random number (PRNG) generator. However, sampling is much slower and the nature of sampling also implies that the hyperspace containing the samples is constrained. This means that testing PRNGs is subject to fewer constraints than testing samplers. We propose a framework that contains five statistical tests which are suited to test uniform random samplers. Moreover, we demonstrate their use by testing seven samplers. Finally, we demonstrate the influence of the Boolean formula given as input to the samplers under test on the test results. △ Less

Submitted 18 March, 2025; originally announced March 2025.

arXiv:2502.19116 [pdf, other]

doi 10.1145/3650212.3680382

Policy Testing with MDPFuzz (Replicability Study)

Authors: Quentin Mazouni, Helge Spieker, Arnaud Gotlieb, Mathieu Acher

Abstract: In recent years, following tremendous achievements in Reinforcement Learning, a great deal of interest has been devoted to ML models for sequential decision-making. Together with these scientific breakthroughs/advances, research has been conducted to develop automated functional testing methods for finding faults in black-box Markov decision processes. Pang et al. (ISSTA 2022) presented a black-bo… ▽ More In recent years, following tremendous achievements in Reinforcement Learning, a great deal of interest has been devoted to ML models for sequential decision-making. Together with these scientific breakthroughs/advances, research has been conducted to develop automated functional testing methods for finding faults in black-box Markov decision processes. Pang et al. (ISSTA 2022) presented a black-box fuzz testing framework called MDPFuzz. The method consists of a fuzzer whose main feature is to use Gaussian Mixture Models (GMMs) to compute coverage of the test inputs as the likelihood to have already observed their results. This guidance through coverage evaluation aims at favoring novelty during testing and fault discovery in the decision model. Pang et al. evaluated their work with four use cases, by comparing the number of failures found after twelve-hour testing campaigns with or without the guidance of the GMMs (ablation study). In this paper, we verify some of the key findings of the original paper and explore the limits of MDPFuzz through reproduction and replication. We re-implemented the proposed methodology and evaluated our replication in a large-scale study that extends the original four use cases with three new ones. Furthermore, we compare MDPFuzz and its ablated counterpart with a random testing baseline. We also assess the effectiveness of coverage guidance for different parameters, something that has not been done in the original evaluation. Despite this parameter analysis and unlike Pang et al.'s original conclusions, we find that in most cases, the aforementioned ablated Fuzzer outperforms MDPFuzz, and conclude that the coverage model proposed does not lead to finding more faults. △ Less

Submitted 26 February, 2025; originally announced February 2025.

Comments: Author's version. 12 pages, 7 figures

Journal ref: ISSTA 2024: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2024, Pages 1567 - 157

arXiv:2409.16480 [pdf, other]

Exploring Performance Trade-offs in JHipster

Authors: Edouard Guégain, Alexandre Bonvoisin, Clément Quinton, Mathieu Acher, Romain Rouvoy

Abstract: The performance of software systems remains a persistent concern in the field of software engineering. While traditional metrics like binary size and execution time have long been focal points for developers, the power consumption concern has gained significant attention, adding a layer of complexity to performance evaluation. Configurable software systems, with their potential for numerous config… ▽ More The performance of software systems remains a persistent concern in the field of software engineering. While traditional metrics like binary size and execution time have long been focal points for developers, the power consumption concern has gained significant attention, adding a layer of complexity to performance evaluation. Configurable software systems, with their potential for numerous configurations, further complicate this evaluation process. In this experience paper, we examine the impact of configurations on performance, specifically focusing on the web stack generator JHipster. Our goal is to understand how configuration choices within JHipster influence the performance of the generated system. We undertake an exhaustive analysis of JHipster by examining its configurations and their effects on system performance. Additionally, we explore individual configuration options to gauge their specific influence on performance. Through this process, we develop a comprehensive performance model for JHipster, enabling us to automate the identification of configurations that optimize specific performance metrics. In particular, we identify configurations that demonstrate near-optimal performance across multiple indicators and report on significant correlations between configuration choices within JHipster and the performance of generated systems. △ Less

Submitted 24 September, 2024; originally announced September 2024.

arXiv:2403.15065 [pdf, other]

doi 10.1145/3644032.3644458

Testing for Fault Diversity in Reinforcement Learning

Authors: Quentin Mazouni, Helge Spieker, Arnaud Gotlieb, Mathieu Acher

Abstract: Reinforcement Learning is the premier technique to approach sequential decision problems, including complex tasks such as driving cars and landing spacecraft. Among the software validation and verification practices, testing for functional fault detection is a convenient way to build trustworthiness in the learned decision model. While recent works seek to maximise the number of detected faults, n… ▽ More Reinforcement Learning is the premier technique to approach sequential decision problems, including complex tasks such as driving cars and landing spacecraft. Among the software validation and verification practices, testing for functional fault detection is a convenient way to build trustworthiness in the learned decision model. While recent works seek to maximise the number of detected faults, none consider fault characterisation during the search for more diversity. We argue that policy testing should not find as many failures as possible (e.g., inputs that trigger similar car crashes) but rather aim at revealing as informative and diverse faults as possible in the model. In this paper, we explore the use of quality diversity optimisation to solve the problem of fault diversity in policy testing. Quality diversity (QD) optimisation is a type of evolutionary algorithm to solve hard combinatorial optimisation problems where high-quality diverse solutions are sought. We define and address the underlying challenges of adapting QD optimisation to the test of action policies. Furthermore, we compare classical QD optimisers to state-of-the-art frameworks dedicated to policy testing, both in terms of search efficiency and fault diversity. We show that QD optimisation, while being conceptually simple and generally applicable, finds effectively more diverse faults in the decision model, and conclude that QD-based policy testing is a promising approach. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 11 pages, 4 figures, 1 algorithm, AST @ ICSE 2024

arXiv:2312.09680 [pdf, other]

A Review of Validation and Verification of Neural Network-based Policies for Sequential Decision Making

Authors: Q. Mazouni, H. Spieker, A. Gotlieb, M. Acher

Abstract: In sequential decision making, neural networks (NNs) are nowadays commonly used to represent and learn the agent's policy. This area of application has implied new software quality assessment challenges that traditional validation and verification practises are not able to handle. Subsequently, novel approaches have emerged to adapt those techniques to NN-based policies for sequential decision mak… ▽ More In sequential decision making, neural networks (NNs) are nowadays commonly used to represent and learn the agent's policy. This area of application has implied new software quality assessment challenges that traditional validation and verification practises are not able to handle. Subsequently, novel approaches have emerged to adapt those techniques to NN-based policies for sequential decision making. This survey paper aims at summarising these novel contributions and proposing future research directions. We conducted a literature review of recent research papers (from 2018 to beginning of 2023), whose topics cover aspects of the test or verification of NN-based policies. The selection has been enriched by a snowballing process from the previously selected papers, in order to relax the scope of the study and provide the reader with insight into similar verification challenges and their recent solutions. 18 papers have been finally selected. Our results show evidence of increasing interest for this subject. They highlight the diversity of both the exact problems considered and the techniques used to tackle them. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: 10 pages, 3 figures, RJCIA 2023

arXiv:2210.14699 [pdf, ps, other]

Piloting Copilot, Codex, and StarCoder2: Hot Temperature, Cold Prompts, or Black Magic?

Authors: Jean-Baptiste Döderlein, Nguessan Hermann Kouadio, Mathieu Acher, Djamel Eddine Khelladi, Benoit Combemale

Abstract: Language models are promising solutions for tackling increasing complex problems. In software engineering, they recently gained attention in code assistants, which generate programs from a natural language task description (prompt). They have the potential to save time and effort but remain poorly understood, limiting their optimal use. In this article, we investigate the impact of input variation… ▽ More Language models are promising solutions for tackling increasing complex problems. In software engineering, they recently gained attention in code assistants, which generate programs from a natural language task description (prompt). They have the potential to save time and effort but remain poorly understood, limiting their optimal use. In this article, we investigate the impact of input variations on two configurations of a language model, focusing on parameters such as task description, surrounding context, model creativity, and the number of generated solutions. We design specific operators to modify these inputs and apply them to three LLM-based code assistants (Copilot, Codex, StarCoder2) and two benchmarks representing algorithmic problems (HumanEval, LeetCode). Our study examines whether these variations significantly affect program quality and how these effects generalize across models. Our results show that varying input parameters can greatly improve performance, achieving up to 79.27% success in one-shot generation compared to 22.44% for Codex and 31.1% for Copilot in default settings. Actioning this potential in practice is challenging due to the complex interplay in our study - the optimal settings for temperature, prompt, and number of generated solutions vary by problem. Reproducing our study with StarCoder2 confirms these findings, indicating they are not model-specific. We also uncover surprising behaviors (e.g., fully removing the prompt can be effective), revealing model brittleness and areas for improvement. △ Less

Submitted 23 June, 2025; v1 submitted 26 October, 2022; originally announced October 2022.

Comments: 53 pages, 3 Figures (not counted the subfigures), 16 Tables

MSC Class: 68T50

arXiv:2210.14082 [pdf, other]

Specialization of Run-time Configuration Space at Compile-time: An Exploratory Study

Authors: Xhevahire Tërnava, Mathieu Acher, Benoit Combemale

Abstract: Numerous software systems are highly configurable through run-time options, such as command-line parameters. Users can tune some of the options to meet various functional and non-functional requirements such as footprint, security, or execution time. However, some options are never set for a given system instance, and their values remain the same whatever the use cases of the system. Herein, we de… ▽ More Numerous software systems are highly configurable through run-time options, such as command-line parameters. Users can tune some of the options to meet various functional and non-functional requirements such as footprint, security, or execution time. However, some options are never set for a given system instance, and their values remain the same whatever the use cases of the system. Herein, we design a controlled experiment in which the system's run-time configuration space can be specialized at compile-time and combinations of options can be removed on demand. We perform an in-depth study of the well-known x264 video encoder and quantify the effects of its specialization to its non-functional properties, namely on binary size, attack surface, and performance while ensuring its validity. Our exploratory study suggests that the configurable specialization of a system has statistically significant benefits on most of its analysed non-functional properties, which benefits depend on the number of the debloated options. While our empirical results and insights show the importance of removing code related to unused run-time options to improve software systems, an open challenge is to further automate the specialization process. △ Less

Submitted 25 October, 2022; originally announced October 2022.

Comments: 14 pages, 3 Figures (not counted the subfigures), 5 Tables, and 1 Algorithm

arXiv:2112.07279 [pdf, other]

The Interaction between Inputs and Configurations fed to Software Systems: an Empirical Study

Authors: Luc Lesoil, Mathieu Acher, Arnaud Blouin, Jean-Marc Jézéquel

Abstract: Widely used software systems such as video encoders are by necessity highly configurable, with hundreds or even thousands of options to choose from. Their users often have a hard time finding suitable values for these options (i.e. finding a proper configuration of the software system) to meet their goals for the tasks at hand, e.g. compress a video down to a certain size. One dimension of the pro… ▽ More Widely used software systems such as video encoders are by necessity highly configurable, with hundreds or even thousands of options to choose from. Their users often have a hard time finding suitable values for these options (i.e. finding a proper configuration of the software system) to meet their goals for the tasks at hand, e.g. compress a video down to a certain size. One dimension of the problem is of course that performance depends on the input data: a video as input to an encoder like x264 or a file system fed to a tool like xz. To achieve good performance, users should therefore take into account both dimensions of (1) software variability and (2) input data. In this problem-statement paper, we conduct a large study over 8 configurable systems that quantifies the existing interactions between input data and configurations of software systems. The results exhibit that (1) inputs fed to software systems interact with their configuration options in non monotonous ways, significantly impacting their performance properties (2) tuning a software system for its input data makes it possible to multiply its performance by up to ten (3) input variability can jeopardize the relevance of performance predictive models for a field deployment. △ Less

Submitted 21 February, 2023; v1 submitted 14 December, 2021; originally announced December 2021.

Comments: 24 pages

Journal ref: Journal of Software and Systems 2023

arXiv:1909.07283 [pdf, other]

Towards Quality Assurance of Software Product Lines with Adversarial Configurations

Authors: Paul Temple, Mathieu Acher, Gilles Perrouin, Battista Biggio, Jean-marc Jezequel, Fabio Roli

Abstract: Software product line (SPL) engineers put a lot of effort to ensure that, through the setting of a large number of possible configuration options, products are acceptable and well-tailored to customers' needs. Unfortunately, options and their mutual interactions create a huge configuration space which is intractable to exhaustively explore. Instead of testing all products, machine learning techniq… ▽ More Software product line (SPL) engineers put a lot of effort to ensure that, through the setting of a large number of possible configuration options, products are acceptable and well-tailored to customers' needs. Unfortunately, options and their mutual interactions create a huge configuration space which is intractable to exhaustively explore. Instead of testing all products, machine learning techniques are increasingly employed to approximate the set of acceptable products out of a small training sample of configurations. Machine learning (ML) techniques can refine a software product line through learned constraints and a priori prevent non-acceptable products to be derived. In this paper, we use adversarial ML techniques to generate adversarial configurations fooling ML classifiers and pinpoint incorrect classifications of products (videos) derived from an industrial video generator. Our attacks yield (up to) a 100% misclassification rate and a drop in accuracy of 5%. We discuss the implications these results have on SPL quality assurance. △ Less

Submitted 16 September, 2019; originally announced September 2019.

Comments: This is a preview version of a paper accepted and presented at SPLC'19 that took place in Paris from 9th to 13th of September 2019. Some minor changes might appear compared to the one from the proceedings of the conference

arXiv:1906.03018 [pdf, other]

Learning Software Configuration Spaces: A Systematic Literature Review

Authors: Juliana Alves Pereira, Hugo Martin, Mathieu Acher, Jean-Marc Jézéquel, Goetz Botterweck, Anthony Ventresque

Abstract: Most modern software systems (operating systems like Linux or Android, Web browsers like Firefox or Chrome, video encoders like ffmpeg, x264 or VLC, mobile and cloud applications, etc.) are highly-configurable. Hundreds of configuration options, features, or plugins can be combined, each potentially with distinct functionality and effects on execution time, security, energy consumption, etc. Due t… ▽ More Most modern software systems (operating systems like Linux or Android, Web browsers like Firefox or Chrome, video encoders like ffmpeg, x264 or VLC, mobile and cloud applications, etc.) are highly-configurable. Hundreds of configuration options, features, or plugins can be combined, each potentially with distinct functionality and effects on execution time, security, energy consumption, etc. Due to the combinatorial explosion and the cost of executing software, it is quickly impossible to exhaustively explore the whole configuration space. Hence, numerous works have investigated the idea of learning it from a small sample of configurations' measurements. The pattern "sampling, measuring, learning" has emerged in the literature, with several practical interests for both software developers and end-users of configurable systems. In this survey, we report on the different application objectives (e.g., performance prediction, configuration optimization, constraint mining), use-cases, targeted software systems and application domains. We review the various strategies employed to gather a representative and cost-effective sample. We describe automated software techniques used to measure functional and non-functional properties of configurations. We classify machine learning algorithms and how they relate to the pursued application. Finally, we also describe how researchers evaluate the quality of the learning process. The findings from this systematic review show that the potential application objective is important; there are a vast number of case studies reported in the literature from the basis of several domains and software systems. Yet, the huge variant space of configurable systems is still challenging and calls to further investigate the synergies between artificial intelligence and software engineering. △ Less

Submitted 7 June, 2019; originally announced June 2019.

arXiv:1805.12021 [pdf, other]

Towards Adversarial Configurations for Software Product Lines

Authors: Paul Temple, Mathieu Acher, Battista Biggio, Jean-Marc Jézéquel, Fabio Roli

Abstract: Ensuring that all supposedly valid configurations of a software product line (SPL) lead to well-formed and acceptable products is challenging since it is most of the time impractical to enumerate and test all individual products of an SPL. Machine learning classifiers have been recently used to predict the acceptability of products associated with unseen configurations. For some configurations, a… ▽ More Ensuring that all supposedly valid configurations of a software product line (SPL) lead to well-formed and acceptable products is challenging since it is most of the time impractical to enumerate and test all individual products of an SPL. Machine learning classifiers have been recently used to predict the acceptability of products associated with unseen configurations. For some configurations, a tiny change in their feature values can make them pass from acceptable to non-acceptable regarding users' requirements and vice-versa. In this paper, we introduce the idea of leveraging these specific configurations and their positions in the feature space to improve the classifier and therefore the engineering of an SPL. Starting from a variability model, we propose to use Adversarial Machine Learning techniques to create new, adversarial configurations out of already known configurations by modifying their feature values. Using an industrial video generator we show how adversarial configurations can improve not only the classifier, but also the variability model, the variability implementation, and the testing oracle. △ Less

Submitted 30 May, 2018; originally announced May 2018.

arXiv:1710.07980 [pdf, other]

doi 10.1007/s10664-018-9635-4

Test them all, is it worth it? Assessing configuration sampling on the JHipster Web development stack

Authors: Axel Halin, Alexandre Nuttinck, Mathieu Acher, Xavier Devroey, Gilles Perrouin, Benoit Baudry

Abstract: Many approaches for testing configurable software systems start from the same assumption: it is impossible to test all configurations. This motivated the definition of variability-aware abstractions and sampling techniques to cope with large configuration spaces. Yet, there is no theoretical barrier that prevents the exhaustive testing of all configurations by simply enumerating them, if the effor… ▽ More Many approaches for testing configurable software systems start from the same assumption: it is impossible to test all configurations. This motivated the definition of variability-aware abstractions and sampling techniques to cope with large configuration spaces. Yet, there is no theoretical barrier that prevents the exhaustive testing of all configurations by simply enumerating them, if the effort required to do so remains acceptable. Not only this: we believe there is lots to be learned by systematically and exhaustively testing a configurable system. In this case study, we report on the first ever endeavour to test all possible configurations of an industry-strength, open source configurable software system, JHipster, a popular code generator for web applications. We built a testing scaffold for the 26,000+ configurations of JHipster using a cluster of 80 machines during 4 nights for a total of 4,376 hours (182 days) CPU time. We find that 35.70% configurations fail and we identify the feature interactions that cause the errors. We show that sampling strategies (like dissimilarity and 2-wise): (1) are more effective to find faults than the 12 default configurations used in the JHipster continuous integration; (2) can be too costly and exceed the available testing budget. We cross this quantitative analysis with the qualitative assessment of JHipster's lead developers. △ Less

Submitted 11 June, 2018; v1 submitted 22 October, 2017; originally announced October 2017.

Comments: Submitted to Empirical Software Engineering

arXiv:1607.04186 [pdf, other]

Large-scale Analysis of Chess Games with Chess Engines: A Preliminary Report

Authors: Mathieu Acher, François Esnault

Abstract: The strength of chess engines together with the availability of numerous chess games have attracted the attention of chess players, data scientists, and researchers during the last decades. State-of-the-art engines now provide an authoritative judgement that can be used in many applications like cheating detection, intrinsic ratings computation, skill assessment, or the study of human decision-mak… ▽ More The strength of chess engines together with the availability of numerous chess games have attracted the attention of chess players, data scientists, and researchers during the last decades. State-of-the-art engines now provide an authoritative judgement that can be used in many applications like cheating detection, intrinsic ratings computation, skill assessment, or the study of human decision-making. A key issue for the research community is to gather a large dataset of chess games together with the judgement of chess engines. Unfortunately the analysis of each move takes lots of times. In this paper, we report our effort to analyse almost 5 millions chess games with a computing grid. During summer 2015, we processed 270 millions unique played positions using the Stockfish engine with a quite high depth (20). We populated a database of 1+ tera-octets of chess evaluations, representing an estimated time of 50 years of computation on a single machine. Our effort is a first step towards the replication of research results, the supply of open data and procedures for exploring new directions, and the investigation of software engineering/scalability issues when computing billions of moves. △ Less

Submitted 28 April, 2016; originally announced July 2016.

arXiv:1502.04645 [pdf, other]

Synthesis of Attributed Feature Models From Product Descriptions: Foundations

Authors: Guillaume Bécan, Razieh Behjati, Arnaud Gotlieb, Mathieu Acher

Abstract: Feature modeling is a widely used formalism to characterize a set of products (also called configurations). As a manual elaboration is a long and arduous task, numerous techniques have been proposed to reverse engineer feature models from various kinds of artefacts. But none of them synthesize feature attributes (or constraints over attributes) despite the practical relevance of attributes for do… ▽ More Feature modeling is a widely used formalism to characterize a set of products (also called configurations). As a manual elaboration is a long and arduous task, numerous techniques have been proposed to reverse engineer feature models from various kinds of artefacts. But none of them synthesize feature attributes (or constraints over attributes) despite the practical relevance of attributes for documenting the different values across a range of products. In this report, we develop an algorithm for synthesizing attributed feature models given a set of product descriptions. We present sound, complete, and parametrizable techniques for computing all possible hierarchies, feature groups, placements of feature attributes, domain values, and constraints. We perform a complexity analysis w.r.t. number of features, attributes, configurations, and domain size. We also evaluate the scalability of our synthesis procedure using randomized configuration matrices. This report is a first step that aims to describe the foundations for synthesizing attributed feature models. △ Less

Submitted 16 February, 2015; originally announced February 2015.

arXiv:1404.6838 [pdf, other]

Metamorphic Domain-Specific Languages: A Journey Into the Shapes of a Language

Authors: Mathieu Acher, Benoit Combemale, Philippe Collet

Abstract: External or internal domain-specific languages (DSLs) or (fluent) APIs? Whoever you are -- a developer or a user of a DSL -- you usually have to choose your side; you should not! What about metamorphic DSLs that change their shape according to your needs? We report on our 4-years journey of providing the "right" support (in the domain of feature modeling), leading us to develop an external DSL, di… ▽ More External or internal domain-specific languages (DSLs) or (fluent) APIs? Whoever you are -- a developer or a user of a DSL -- you usually have to choose your side; you should not! What about metamorphic DSLs that change their shape according to your needs? We report on our 4-years journey of providing the "right" support (in the domain of feature modeling), leading us to develop an external DSL, different shapes of an internal API, and maintain all these languages. A key insight is that there is no one-size-fits-all solution or no clear superiority of a solution compared to another. On the contrary, we found that it does make sense to continue the maintenance of an external and internal DSL. The vision that we foresee for the future of software languages is their ability to be self-adaptable to the most appropriate shape (including the corresponding integrated development environment) according to a particular usage or task. We call metamorphic DSL such a language, able to change from one shape to another shape. △ Less

Submitted 27 April, 2014; originally announced April 2014.

Showing 1–18 of 18 results for author: Acher, M