Search | arXiv e-print repository

arXiv:2501.19222 [pdf, other]

Development and Evolution of Xtext-based DSLs on GitHub: An Empirical Investigation

Authors: Weixing Zhang, Daniel Strüber, Regina Hebig

Abstract: Domain-specific languages (DSLs) play a crucial role in facilitating a wide range of software development activities in the context of model-driven engineering (MDE). However, a systematic understanding of their evolution is lacking, which hinders methodology and tool development. To address this gap, we performed a comprehensive investigation into the development and evolution of textual DSLs cre… ▽ More Domain-specific languages (DSLs) play a crucial role in facilitating a wide range of software development activities in the context of model-driven engineering (MDE). However, a systematic understanding of their evolution is lacking, which hinders methodology and tool development. To address this gap, we performed a comprehensive investigation into the development and evolution of textual DSLs created with Xtext, a particularly widely used language workbench in the MDE. We systematically identified and analyzed 1002 GitHub repositories containing Xtext-related projects. A manual classification of the repositories brought forward 226 ones that contain a fully developed language. These were further categorized into 18 application domains, where we examined DSL artifacts and the availability of example instances. We explored DSL development practices, including development scenarios, evolution activities, and co-evolution of related artifacts. We observed that DSLs are used more, evolve faster, and are maintained longer in specific domains, such as Data Management and Databases. We identified DSL grammar definitions in 722 repositories, but only a third provided textual instances, with most utilizing over 60% of grammar rules. We found that most analyzed DSLs followed a grammar-driven approach, though some adopted a metamodel-driven approach. Additionally, we observed a trend of retrofitting existing languages in Xtext, demonstrating its flexibility beyond new DSL creation. We found that in most DSL development projects, updates to grammar definitions and example instances are very frequent, and most of the evolution activities can be classified as ``perfective'' changes. To support the research in the model-driven engineering community, we contribute a dataset of repositories with meta-information, helping to develop improved tools for DSL evolution. △ Less

Submitted 31 January, 2025; originally announced January 2025.

arXiv:2407.09895 [pdf, other]

EATXT: A textual concrete syntax for EAST-ADL

Authors: Weixing Zhang, Jörg Holtmann, Daniel Strüber, Jan-Philipp Steghöfer

Abstract: Blended modeling is an approach that enables users to interact with a model via multiple notations. In this context, there is a growing need for open-source industry-grade exemplars of languages with available language engineering artifacts, in particular, editors and notations for supporting the creation of models based on a single metamodel in different representations (e.g., textual, graphical,… ▽ More Blended modeling is an approach that enables users to interact with a model via multiple notations. In this context, there is a growing need for open-source industry-grade exemplars of languages with available language engineering artifacts, in particular, editors and notations for supporting the creation of models based on a single metamodel in different representations (e.g., textual, graphical, and tabular ones). These exemplars can support the development of advanced solutions to address the practical challenges posed by blended modeling requirements. As one such exemplar, this paper introduces EATXT, a textual concrete syntax for automotive architecture modeling with EAST-ADL, developed in cooperation with an industry partner in the automotive domain. The EATXT editor is based on Xtext and provides basic and advanced features, such as an improved content-assist and serialization specifically addressing blended modeling requirements. We present the editor features and architecture, the implementation approach, and previous use of EATXT in research. The EATXT editor is publicly available, rendering it a valuable resource for language developers. △ Less

Submitted 13 July, 2024; originally announced July 2024.

arXiv:2401.17351 [pdf, other]

Supporting Meta-model-based Language Evolution and Rapid Prototyping with Automated Grammar Optimization

Authors: Weixing Zhang, Jörg Holtmann, Daniel Strüber, Regina Hebig, Jan-Philipp Steghöfer

Abstract: In model-driven engineering, developing a textual domain-specific language (DSL) involves constructing a meta-model, which defines an underlying abstract syntax, and a grammar, which defines the concrete syntax for the DSL. Language workbenches such as Xtext allow the grammar to be automatically generated from the meta-model, yet the generated grammar usually needs to be manually optimized to impr… ▽ More In model-driven engineering, developing a textual domain-specific language (DSL) involves constructing a meta-model, which defines an underlying abstract syntax, and a grammar, which defines the concrete syntax for the DSL. Language workbenches such as Xtext allow the grammar to be automatically generated from the meta-model, yet the generated grammar usually needs to be manually optimized to improve its usability. When the meta-model changes during rapid prototyping or language evolution, it can become necessary to re-generate the grammar and optimize it again, causing repeated effort and potential for errors. In this paper, we present GrammarOptimizer, an approach for optimizing generated grammars in the context of meta-model-based language evolution. To reduce the effort for language engineers during rapid prototyping and language evolution, it offers a catalog of configurable grammar optimization rules. Once configured, these rules can be automatically applied and re-applied after future evolution steps, greatly reducing redundant manual effort. In addition, some of the supported optimizations can globally change the style of concrete syntax elements, further significantly reducing the effort for manual optimizations. The grammar optimization rules were extracted from a comparison of generated and existing, expert-created grammars, based on seven available DSLs. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: 34 pages

arXiv:2310.01039 [pdf, other]

doi 10.1007/s10664-024-10596-9

Software Reconfiguration in Robotics

Authors: Sven Peldszus, Davide Brugali, Daniel Strüber, Patrizio Pelliccione, Thorsten Berger

Abstract: Robots often need to be reconfigurable$-$to customize, calibrate, or optimize robots operating in varying environments with different hardware). A particular challenge in robotics is the automated and dynamic reconfiguration to load and unload software components, as well as parameterizing them. Over the last decades, a large variety of software reconfiguration techniques has been presented in the… ▽ More Robots often need to be reconfigurable$-$to customize, calibrate, or optimize robots operating in varying environments with different hardware). A particular challenge in robotics is the automated and dynamic reconfiguration to load and unload software components, as well as parameterizing them. Over the last decades, a large variety of software reconfiguration techniques has been presented in the literature, many specifically for robotics systems. Also many robotics frameworks support reconfiguration. Unfortunately, there is a lack of empirical data on the actual use of reconfiguration techniques in real robotics projects and on their realization in robotics frameworks. To advance reconfiguration techniques and support their adoption, we need to improve our empirical understanding of them in practice. We present a study of automated reconfiguration at runtime in the robotics domain. We determine the state-of-the art by reviewing 78 relevant publications on reconfiguration. We determine the state-of-practice by analyzing how four major robotics frameworks support reconfiguration, and how reconfiguration is realized in 48 robotics (sub-)systems. We contribute a detailed analysis of the design space of reconfiguration techniques. We identify trends and research gaps. Our results show a significant discrepancy between the state-of-the-art and the state-of-practice. While the scientific community focuses on complex structural reconfiguration, only parameter reconfiguration is widely used in practice. Our results support practitioners to realize reconfiguration in robotics systems, as well as they support researchers and tool builders to create more effective reconfiguration techniques that are adopted in practice. △ Less

Submitted 9 April, 2025; v1 submitted 2 October, 2023; originally announced October 2023.

Journal ref: Empir Software Eng 30, 94 (2025)

arXiv:2309.04347 [pdf, other]

A Rapid Prototyping Language Workbench for Textual DSLs based on Xtext: Vision and Progress

Authors: Weixing Zhang, Jan-Philipp Steghöfer, Regina Hebig, Daniel Strüber

Abstract: Metamodel-based DSL development in language workbenches like Xtext allows language engineers to focus more on metamodels and domain concepts rather than grammar details. However, the grammar generated from metamodels often requires manual modification, which can be tedious and time-consuming. Especially when it comes to rapid prototyping and language evolution, the grammar will be generated repeat… ▽ More Metamodel-based DSL development in language workbenches like Xtext allows language engineers to focus more on metamodels and domain concepts rather than grammar details. However, the grammar generated from metamodels often requires manual modification, which can be tedious and time-consuming. Especially when it comes to rapid prototyping and language evolution, the grammar will be generated repeatedly, this means that language engineers need to repeat such manual modification back and forth. Previous work introduced GrammarOptimizer, which automatically improves the generated grammar using optimization rules. However, the optimization rules need to be configured manually, which lacks user-friendliness and convenience. In this paper, we present our vision for and current progress towards a language workbench that integrates GrammarOptimizer's grammar optimization rules to support rapid prototyping and evolution of metamodel-based languages. It provides a visual configuration of optimization rules and a real-time preview of the effects of grammar optimization to address the limitations of GrammarOptimizer. Furthermore, it supports the inference of a grammar based on examples from model instances and offers a selection of language styles. These features aim to enhance the automation level of metamodel-based DSL development with Xtext and assist language engineers in iterative development and rapid prototyping. Our paper discusses the potential and applications of this language workbench, as well as how it fills the gaps in existing language workbenches. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: 6 pages, 3 figures

arXiv:2308.13809 [pdf, ps, other]

The complexity paradox: An analysis of modeling education through the lens of complexity science

Authors: Daniel Strüber

Abstract: Modeling seeks to tame complexity during software development, by supporting design, analysis, and stakeholder communication. Paradoxically, experiences made by educators indicate that students often perceive modeling as adding complexity, instead of reducing it. In this position paper, I analyse modeling education from the lens of complexity science, a theoretical framework for the study of compl… ▽ More Modeling seeks to tame complexity during software development, by supporting design, analysis, and stakeholder communication. Paradoxically, experiences made by educators indicate that students often perceive modeling as adding complexity, instead of reducing it. In this position paper, I analyse modeling education from the lens of complexity science, a theoretical framework for the study of complex systems. I revisit pedagogical literature where complexity science has been used as a framework for general education and subject-specific education in disciplines such as medicine, project management, and sustainability. I revisit complexity-related challenges from modeling education literature, discuss them in the light of complexity and present recommendations for taming complexity when teaching modeling. △ Less

Submitted 26 August, 2023; originally announced August 2023.

Comments: 5 pages; accepted for publication at the IEEE/ACM International Conference on Modeling Driven Engineering Languages and Systems (MODELS), Educators Symposium

arXiv:2305.03432 [pdf, other]

Finding the Right Way to Rome: Effect-oriented Graph Transformation

Authors: Jens Kosiol, Daniel Strüber, Gabriele Taentzer, Steffen Zschaler

Abstract: Many applications of graph transformation require rules that change a graph without introducing new consistency violations. When designing such rules, it is natural to think about the desired outcome state, i.e., the desired effect, rather than the specific steps required to achieve it; these steps may vary depending on the specific rule-application context. Existing graph-transformation approache… ▽ More Many applications of graph transformation require rules that change a graph without introducing new consistency violations. When designing such rules, it is natural to think about the desired outcome state, i.e., the desired effect, rather than the specific steps required to achieve it; these steps may vary depending on the specific rule-application context. Existing graph-transformation approaches either require a separate rule to be written for every possible application context or lack the ability to constrain the maximal change that a rule will create. We introduce effect-oriented graph transformation, shifting the semantics of a rule from specifying actions to representing the desired effect. A single effect-oriented rule can encode a large number of induced classic rules. Which of the potential actions is executed depends on the application context; ultimately, all ways lead to Rome. If a graph element to be deleted (created) by a potential action is already absent (present), this action need not be performed because the desired outcome is already present. We formally define effect-oriented graph transformation, show how matches can be computed without explicitly enumerating all induced classic rules, and report on a prototypical implementation of effect-oriented graph transformation in Henshin. △ Less

Submitted 5 May, 2023; originally announced May 2023.

Comments: 27 pages, 7 figures; extended version of the paper accepted for publication at ICGT '23

arXiv:2209.15620 [pdf, other]

doi 10.1007/978-3-031-15629-8_8

Family-Based Fingerprint Analysis: A Position Paper

Authors: Carlos Diego Nascimento Damasceno, Daniel Strüber

Abstract: Thousands of vulnerabilities are reported on a monthly basis to security repositories, such as the National Vulnerability Database. Among these vulnerabilities, software misconfiguration is one of the top 10 security risks for web applications. With this large influx of vulnerability reports, software fingerprinting has become a highly desired capability to discover distinctive and efficient signa… ▽ More Thousands of vulnerabilities are reported on a monthly basis to security repositories, such as the National Vulnerability Database. Among these vulnerabilities, software misconfiguration is one of the top 10 security risks for web applications. With this large influx of vulnerability reports, software fingerprinting has become a highly desired capability to discover distinctive and efficient signatures and recognize reportedly vulnerable software implementations. Due to the exponential worst-case complexity of fingerprint matching, designing more efficient methods for fingerprinting becomes highly desirable, especially for variability-intensive systems where optional features add another exponential factor to its analysis. This position paper presents our vision of a framework that lifts model learning and family-based analysis principles to software fingerprinting. In this framework, we propose unifying databases of signatures into a featured finite state machine and using presence conditions to specify whether and in which circumstances a given input-output trace is observed. We believe feature-based signatures can aid performance improvements by reducing the size of fingerprints under analysis. △ Less

Submitted 27 September, 2022; originally announced September 2022.

Comments: Paper published in the Proceedings A Journey from Process Algebra via Timed Automata to Model Learning: Essays Dedicated to Frits Vaandrager on the Occasion of His 60th Birthday 2022

arXiv:2204.12918 [pdf, other]

We're Not Gonna Break It! Consistency-Preserving Operators for Efficient Product Line Configuration

Authors: Jose-Miguel Horcas, Daniel Strüber, Alexandru Burdusel, Jabier Martinez, Steffen Zschaler

Abstract: When configuring a software product line, finding a good trade-off between multiple orthogonal quality concerns is a challenging multi-objective optimisation problem. State-of-the-art solutions based on search-based techniques create invalid configurations in intermediate steps, requiring additional repair actions that reduce the efficiency of the search. In this work, we introduce consistency-pre… ▽ More When configuring a software product line, finding a good trade-off between multiple orthogonal quality concerns is a challenging multi-objective optimisation problem. State-of-the-art solutions based on search-based techniques create invalid configurations in intermediate steps, requiring additional repair actions that reduce the efficiency of the search. In this work, we introduce consistency-preserving configuration operators (CPCOs)--genetic operators that maintain valid configurations throughout the entire search. CPCOs bundle coherent sets of changes: the activation or deactivation of a particular feature together with other (de)activations that are needed to preserve validity. In our evaluation, our instantiation of the IBEA algorithm with CPCOs outperforms two state-of-the-art tools for optimal product line configuration in terms of both speed and solution quality. The improvements are especially pronounced in large product lines with thousands of features. △ Less

Submitted 27 April, 2022; originally announced April 2022.

Comments: Accepted for publication in IEEE Transactions on Software Engineering (TSE). 16 pages, 10 figures; includes an appendix with 8 additional pages and 4 additional figures

arXiv:2112.01315 [pdf, other]

A Generator Framework For Evolving Variant-Rich Software

Authors: Christoph Derks, Daniel Strüber, Thorsten Berger

Abstract: Evolving software is challenging, even more when it exists in many different variants. Such software evolves not only in time, but also in space--another dimension of complexity. While evolution in space is supported by a variety of product-line and variability management tools, many of which originating from research, their level of evaluation varies significantly, which threatens their relevance… ▽ More Evolving software is challenging, even more when it exists in many different variants. Such software evolves not only in time, but also in space--another dimension of complexity. While evolution in space is supported by a variety of product-line and variability management tools, many of which originating from research, their level of evaluation varies significantly, which threatens their relevance for practitioners and future research. Many tools have only been evaluated on ad hoc datasets, minimal examples or available preprocessor-based product lines, missing the early clone & own phases and the re-engineering into configurable platforms--large parts of the actual evolution lifecycle of variant-rich systems. Our long-term goal is to provide benchmarks to increase the maturity of evaluating such tools. However, providing manually curated benchmarks that cover the whole evolution lifecycle and that are detailed enough to serve as ground truths, is challenging. We present the framework vpbench to generates source-code histories of variant-rich systems. Vpbench comprises several modular generators relying on evolution operators that systematically and automatically evolve real codebases and document the evolution in detail. We provide simple and more advanced generators--e.g., relying on code transplantation techniques to obtain whole features from external, real-world projects. We define requirements and demonstrate how vpbench addresses them for the generated version histories, focusing on support for evolution in time and space, the generation of detailed meta-data about the evolution, also considering compileability and extensibility. △ Less

Submitted 2 December, 2021; originally announced December 2021.

Comments: 9 pages, 5 figures

arXiv:2109.02304 [pdf, other]

Towards Multi-Criteria Prioritization of Best Practices in Research Artifact Sharing

Authors: Carlos Diego Nascimento Damasceno, Isotilia Costa Melo, Daniel Struber

Abstract: Research artifact sharing is known to strengthen the transparency of scientific studies. However, in the lack of common discipline-specific guidelines for artifacts evaluation, subjective and conflicting expectations may happen and threaten artifact quality. In this paper, we discuss our preliminary ideas for a framework based on quality management principles (5W2H) that can aid in the establishme… ▽ More Research artifact sharing is known to strengthen the transparency of scientific studies. However, in the lack of common discipline-specific guidelines for artifacts evaluation, subjective and conflicting expectations may happen and threaten artifact quality. In this paper, we discuss our preliminary ideas for a framework based on quality management principles (5W2H) that can aid in the establishment of common guidelines for artifact evaluation and sharing. Also, using the Analytic Hierarchy Process, we discuss how research communities could join efforts to aid the guidelines' adequacy to research priorities. These combined methodologies constitute a novelty for software engineering research which can foster research software sustainability. △ Less

Submitted 6 September, 2021; originally announced September 2021.

Comments: 5 pages, 2 figures, Emerging results paper published in the 1st Workshop on Open Science Practices for Software Engineering (OpenScienSE 2021)

arXiv:2108.08579 [pdf]

doi 10.1007/s10270-022-00991-5

Checking Security Compliance between Models and Code

Authors: Katja Tuma, Sven Peldszus, Daniel Strüber, Riccardo Scandariato, Jan Jürjens

Abstract: It is challenging to verify that the planned security mechanisms are actually implemented in the software. In the context of model-based development, the implemented security mechanisms must capture all intended security properties that were considered in the design models. Assuring this compliance manually is labor intensive and can be error-prone. This work introduces the first semi-automatic te… ▽ More It is challenging to verify that the planned security mechanisms are actually implemented in the software. In the context of model-based development, the implemented security mechanisms must capture all intended security properties that were considered in the design models. Assuring this compliance manually is labor intensive and can be error-prone. This work introduces the first semi-automatic technique for secure data flow compliance checks between design models and code. We develop heuristic-based automated mappings between a design-level model (SecDFD, provided by humans) and a code-level representation (Program Model, automatically extracted from the implementation) in order to guide users in discovering compliance violations, and hence potential security flaws in the code. These mappings enable an automated, and project-specific static analysis of the implementation with respect to the desired security properties of the design model. We developed two types of security compliance checks and evaluated the entire approach on open source Java projects. △ Less

Submitted 18 March, 2022; v1 submitted 19 August, 2021; originally announced August 2021.

arXiv:2108.04652 [pdf, other]

doi 10.1109/MODELS50736.2021.00036

Quality Guidelines for Research Artifacts in Model-Driven Engineering

Authors: Carlos Diego Nascimento Damasceno, Daniel Strüber

Abstract: Sharing research artifacts is known to help people to build upon existing knowledge, adopt novel contributions in practice, and increase the chances of papers receiving attention. In Model-Driven Engineering (MDE), openly providing research artifacts plays a key role, even more so as the community targets a broader use of AI techniques, which can only become feasible if large open datasets and con… ▽ More Sharing research artifacts is known to help people to build upon existing knowledge, adopt novel contributions in practice, and increase the chances of papers receiving attention. In Model-Driven Engineering (MDE), openly providing research artifacts plays a key role, even more so as the community targets a broader use of AI techniques, which can only become feasible if large open datasets and confidence measures for their quality are available. However, the current lack of common discipline-specific guidelines for research data sharing opens the opportunity for misunderstandings about the true potential of research artifacts and subjective expectations regarding artifact quality. To address this issue, we introduce a set of guidelines for artifact sharing specifically tailored to MDE research. To design this guidelines set, we systematically analyzed general-purpose artifact sharing practices of major computer science venues and tailored them to the MDE domain. Subsequently, we conducted an online survey with 90 researchers and practitioners with expertise in MDE. We investigated our participants' experiences in developing and sharing artifacts in MDE research and the challenges encountered while doing so. We then asked them to prioritize each of our guidelines as essential, desirable, or unnecessary. Finally, we asked them to evaluate our guidelines with respect to clarity, completeness, and relevance. In each of these dimensions, our guidelines were assessed positively by more than 92\% of the participants. To foster the reproducibility and reusability of our results, we make the full set of generated artifacts available in an open repository at \texttt{\url{https://mdeartifacts.github.io/}}. △ Less

Submitted 15 November, 2021; v1 submitted 10 August, 2021; originally announced August 2021.

Comments: 12 pages, 5 figures, 7 tables, accepted for publication at the ACM/IEEE 24th International Conference on Model Driven Engineering Languages and Systems (MODELS 2021), Foundations Track - Technical Papers

arXiv:2104.06161 [pdf, other]

Feature-Oriented Defect Prediction: Scenarios, Metrics, and Classifiers

Authors: Mukelabai Mukelabai, Stefan Strüder, Daniel Strüber, Thorsten Berger

Abstract: Several software defect prediction techniques have been developed over the past decades. These techniques predict defects at the granularity of typical software assets, such as components and files. In this paper, we investigate feature-oriented defect prediction: predicting defects at the granularity of features -- domain-entities that represent software functionality and often cross-cut software… ▽ More Several software defect prediction techniques have been developed over the past decades. These techniques predict defects at the granularity of typical software assets, such as components and files. In this paper, we investigate feature-oriented defect prediction: predicting defects at the granularity of features -- domain-entities that represent software functionality and often cross-cut software assets. Feature-oriented defect prediction can be beneficial since: (i) some features might be more error-prone than others, (ii) characteristics of defective features might be useful to predict other error-prone features, and (iii) feature-specific code might be prone to faults arising from feature interactions. We explore the feasibility and solution space for feature-oriented defect prediction. Our study relies on 12 software projects from which we analyzed 13,685 bug-introducing and corrective commits, and systematically generated 62,868 training and test datasets to evaluate classifiers, metrics, and scenarios. The datasets were generated based on the 13,685 commits, 81 releases, and 24, 532 permutations of our 12 projects depending on the scenario addressed. We covered scenarios such as just-in-time (JIT) and cross-project defect prediction. Our results confirm the feasibility of feature-oriented defect prediction. We found the best performance (i.e., precision and robustness) when using the Random Forest classifier, with process and structure metrics. Surprisingly, single-project JIT and release-level predictions had median AUC-ROC values greater than 95% and 90% respectively, contrary to studies that assert poor performance due to insufficient training data. We also found that a model trained on release-level data from one of the twelve projects could predict defect-proneness of features in the other eleven projects with median AUC-ROC of 82%, without retraining. △ Less

Submitted 13 April, 2021; originally announced April 2021.

Comments: 16 pages, 10 figures, 14 tables, journal

arXiv:2103.00437 [pdf, other]

Seamless Variability Management With the Virtual Platform

Authors: Wardah Mahmood, Daniel Strüber, Thorsten Berger, Ralf Lämmel, Mukelabai Mukelabai

Abstract: Customization is a general trend in software engineering, demanding systems that support variable stakeholder requirements. Two opposing strategies are commonly used to create variants: software clone & own and software configuration with an integrated platform. Organizations often start with the former, which is cheap, agile, and supports quick innovation, but does not scale. The latter scales by… ▽ More Customization is a general trend in software engineering, demanding systems that support variable stakeholder requirements. Two opposing strategies are commonly used to create variants: software clone & own and software configuration with an integrated platform. Organizations often start with the former, which is cheap, agile, and supports quick innovation, but does not scale. The latter scales by establishing an integrated platform that shares software assets between variants, but requires high up-front investments or risky migration processes. So, could we have a method that allows an easy transition or even combine the benefits of both strategies? We propose a method and tool that supports a truly incremental development of variant-rich systems, exploiting a spectrum between both opposing strategies. We design, formalize, and prototype the variability-management framework virtual platform. It bridges clone & own and platform-oriented development. Relying on programming-language-independent conceptual structures representing software assets, it offers operators for engineering and evolving a system, comprising: traditional, asset-oriented operators and novel, feature-oriented operators for incrementally adopting concepts of an integrated platform. The operators record meta-data that is exploited by other operators to support the transition. Among others, they eliminate expensive feature-location effort or the need to trace clones. Our evaluation simulates the evolution of a real-world, clone-based system, measuring its costs and benefits. △ Less

Submitted 2 March, 2021; v1 submitted 28 February, 2021; originally announced March 2021.

Comments: 13 pages, 10 figures; accepted for publication at the 43rd International Conference on Software Engineering (ICSE 2021), main technical track

arXiv:2102.06919 [pdf, other]

Asset Management in Machine Learning: A Survey

Authors: Samuel Idowu, Daniel Strüber, Thorsten Berger

Abstract: Machine Learning (ML) techniques are becoming essential components of many software systems today, causing an increasing need to adapt traditional software engineering practices and tools to the development of ML-based software systems. This need is especially pronounced due to the challenges associated with the large-scale development and deployment of ML systems. Among the most commonly reported… ▽ More Machine Learning (ML) techniques are becoming essential components of many software systems today, causing an increasing need to adapt traditional software engineering practices and tools to the development of ML-based software systems. This need is especially pronounced due to the challenges associated with the large-scale development and deployment of ML systems. Among the most commonly reported challenges during the development, production, and operation of ML-based systems are experiment management, dependency management, monitoring, and logging of ML assets. In recent years, we have seen several efforts to address these challenges as witnessed by an increasing number of tools for tracking and managing ML experiments and their assets. To facilitate research and practice on engineering intelligent systems, it is essential to understand the nature of the current tool support for managing ML assets. What kind of support is provided? What asset types are tracked? What operations are offered to users for managing those assets? We discuss and position ML asset management as an important discipline that provides methods and tools for ML assets as structures and the ML development activities as their operations. We present a feature-based survey of 17 tools with ML asset management support identified in a systematic search. We overview these tools' features for managing the different types of assets used for engineering ML-based systems and performing experiments. We found that most of the asset management support depends on traditional version control systems, while only a few tools support an asset granularity level that differentiates between important ML assets, such as datasets and models. △ Less

Submitted 17 February, 2021; v1 submitted 13 February, 2021; originally announced February 2021.

Comments: 10 pages, 8 figures. Accepted for publication at ICSE-SEIP 2021: International Conference on Software Engineering, track on Software Engineering in Practice

arXiv:2012.11976 [pdf, other]

doi 10.1145/3412841.3442046

A Maturity Assessment Framework for Conversational AI Development Platforms

Authors: Johan Aronsson, Philip Lu, Daniel Strüber, Thorsten Berger

Abstract: Conversational Artificial Intelligence (AI) systems have recently sky-rocketed in popularity and are now used in many applications, from car assistants to customer support. The development of conversational AI systems is supported by a large variety of software platforms, all with similar goals, but different focus points and functionalities. A systematic foundation for classifying conversational… ▽ More Conversational Artificial Intelligence (AI) systems have recently sky-rocketed in popularity and are now used in many applications, from car assistants to customer support. The development of conversational AI systems is supported by a large variety of software platforms, all with similar goals, but different focus points and functionalities. A systematic foundation for classifying conversational AI platforms is currently lacking. We propose a framework for assessing the maturity level of conversational AI development platforms. Our framework is based on a systematic literature review, in which we extracted common and distinguishing features of various open-source and commercial (or in-house) platforms. Inspired by language reference frameworks, we identify different maturity levels that a conversational AI development platform may exhibit in understanding and responding to user inputs. Our framework can guide organizations in selecting a conversational AI development platform according to their needs, as well as helping researchers and platform developers improving the maturity of their platforms. △ Less

Submitted 22 December, 2020; originally announced December 2020.

Comments: 10 pages, 10 figures. Accepted for publication at SAC 2021: ACM/SIGAPP Symposium On Applied Computing

arXiv:2012.02645 [pdf, other]

Supporting Round-Trip Data Migration for Web APIs: A Henshin Solution

Authors: Daniel Strüber

Abstract: We present a solution to the Round-Trip Migration case of the Transformation Tool Contest 2020, based on the Henshin model transformation language. The task is to support four scenarios of transformations between two versions of the same data metamodel, a problem inspired by the application scenario of Web API migration, where such a round-trip migration methodology might mitigate drawbacks of the… ▽ More We present a solution to the Round-Trip Migration case of the Transformation Tool Contest 2020, based on the Henshin model transformation language. The task is to support four scenarios of transformations between two versions of the same data metamodel, a problem inspired by the application scenario of Web API migration, where such a round-trip migration methodology might mitigate drawbacks of the conventional "instant" migration style. Our solution relies on Henshin's visual syntax, which seems well-suited to capture the problem on an intuitive level, since the syntax is already similar to the scenario illustrations in the case description. We discuss the five evaluation criteria expressiveness, comprehensibility, bidirectionality, performance, and reusability. △ Less

Submitted 4 December, 2020; originally announced December 2020.

Comments: 5 pages, 5 figures; accepted for publication in the proceedings for Transformation Tool Contest (TTC) 2020

arXiv:2011.06244 [pdf, other]

A Fine-grained Data Set and Analysis of Tangling in Bug Fixing Commits

Authors: Steffen Herbold, Alexander Trautsch, Benjamin Ledel, Alireza Aghamohammadi, Taher Ahmed Ghaleb, Kuljit Kaur Chahal, Tim Bossenmaier, Bhaveet Nagaria, Philip Makedonski, Matin Nili Ahmadabadi, Kristof Szabados, Helge Spieker, Matej Madeja, Nathaniel Hoy, Valentina Lenarduzzi, Shangwen Wang, Gema Rodríguez-Pérez, Ricardo Colomo-Palacios, Roberto Verdecchia, Paramvir Singh, Yihao Qin, Debasish Chakroborti, Willard Davis, Vijay Walunj, Hongjun Wu , et al. (23 additional authors not shown)

Abstract: Context: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective: We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits. Metho… ▽ More Context: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective: We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits. Methods: We use a crowd sourcing approach for manual labeling to validate which changes contribute to bug fixes for each line in bug fixing commits. Each line is labeled by four participants. If at least three participants agree on the same label, we have consensus. Results: We estimate that between 17% and 32% of all changes in bug fixing commits modify the source code to fix the underlying problem. However, when we only consider changes to the production code files this ratio increases to 66% to 87%. We find that about 11% of lines are hard to label leading to active disagreements between participants. Due to confirmed tangling and the uncertainty in our data, we estimate that 3% to 47% of data is noisy without manual untangling, depending on the use case. Conclusion: Tangled commits have a high prevalence in bug fixes and can lead to a large amount of noise in the data. Prior research indicates that this noise may alter results. As researchers, we should be skeptics and assume that unvalidated data is likely very noisy, until proven otherwise. △ Less

Submitted 13 October, 2021; v1 submitted 12 November, 2020; originally announced November 2020.

Comments: Status: Accepted at Empirical Software Engineering

arXiv:2006.10608 [pdf, other]

doi 10.1145/3368089.3409743

Robotics Software Engineering: A Perspective from the Service Robotics Domain

Authors: Sergio García, Daniel Strüber, Davide Brugali, Thorsten Berger, Patrizio Pelliccione

Abstract: Robots that support humans by performing useful tasks (a.k.a., service robots) are booming worldwide. In contrast to industrial robots, the development of service robots comes with severe software engineering challenges, since they require high levels of robustness and autonomy to operate in highly heterogeneous environments. As a domain with critical safety implications, service robotics faces a… ▽ More Robots that support humans by performing useful tasks (a.k.a., service robots) are booming worldwide. In contrast to industrial robots, the development of service robots comes with severe software engineering challenges, since they require high levels of robustness and autonomy to operate in highly heterogeneous environments. As a domain with critical safety implications, service robotics faces a need for sound software development practices. In this paper, we present the first large-scale empirical study to assess the state of the art and practice of robotics software engineering. We conducted 18 semi-structured interviews with industrial practitioners working in 15 companies from 9 different countries and a survey with 156 respondents (from 26 countries) from the robotics domain. Our results provide a comprehensive picture of (i) the practices applied by robotics industrial and academic practitioners, including processes, paradigms, languages, tools, frameworks, and reuse practices, (ii) the distinguishing characteristics of robotics software engineering, and (iii) recurrent challenges usually faced, together with adopted solutions. The paper concludes by discussing observations, derived hypotheses, and proposed actions for researchers and practitioners. △ Less

Submitted 8 September, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

Comments: 11 pages + 1 page for references, 3 figures, 3 tables, in proceedings of ESEC/FSE 2020

arXiv:2005.04162 [pdf, other]

Graph Consistency as a Graduated Property: Consistency-Sustaining and -Improving Graph Transformations

Authors: Jens Kosiol, Daniel Strüber, Gabriele Taentzer, Steffen Zschaler

Abstract: Where graphs are used for modelling and specifying systems, consistency is an important concern. To be a valid model of a system, the graph structure must satisfy a number of constraints. To date, consistency has primarily been viewed as a binary property: a graph either is or is not consistent with respect to a set of graph constraints. This has enabled the definition of notions such as constrain… ▽ More Where graphs are used for modelling and specifying systems, consistency is an important concern. To be a valid model of a system, the graph structure must satisfy a number of constraints. To date, consistency has primarily been viewed as a binary property: a graph either is or is not consistent with respect to a set of graph constraints. This has enabled the definition of notions such as constraint-preserving and constraint-guaranteeing graph transformations. Many practical applications - for example model repair or evolutionary search - implicitly assume a more graduated notion of consistency, but without an explicit formalisation only limited analysis of these applications is possible. In this paper, we introduce an explicit notion of consistency as a graduated property, depending on the number of constraint violations in a graph. We present two new characterisations of transformations (and transformation rules) enabling reasoning about the gradual introduction of consistency: while consistency-sustaining transformations do not decrease the consistency level, consistency-improving transformations strictly reduce the number of constraint violations. We show how these new definitions refine the existing concepts of constraint-preserving and constraint-guaranteeing transformations. To support a static analysis based on our characterisations, we present criteria for deciding which form of consistency ensuring transformations is induced by the application of a transformation rule. We illustrate our contributions in the context of an example from search-based model engineering. △ Less

Submitted 1 November, 2021; v1 submitted 8 May, 2020; originally announced May 2020.

Comments: 23 pages, accepted for publication at the International Conference on Graph Transformation 2020 Typos corrected, heading for Table 2 clarified, wrong statement in Theorem 2 omitted

Showing 1–21 of 21 results for author: Strüber, D