-
Assessing the Linguistic Quality of REST APIs for IoT Applications
Authors:
Francis Palma,
Tobias Olsson,
Anna Wingkvist,
Javier Gonzalez-Huerta
Abstract:
Internet of Things (IoT) is a growing technology that relies on connected 'things' that gather data from peer devices and send data to servers via APIs (Application Programming Interfaces). The design quality of those APIs has a direct impact on their understandability and reusability. This study focuses on the linguistic design quality of REST APIs for IoT applications and assesses their linguist…
▽ More
Internet of Things (IoT) is a growing technology that relies on connected 'things' that gather data from peer devices and send data to servers via APIs (Application Programming Interfaces). The design quality of those APIs has a direct impact on their understandability and reusability. This study focuses on the linguistic design quality of REST APIs for IoT applications and assesses their linguistic quality by performing the detection of linguistic patterns and antipatterns in REST APIs for IoT applications. Linguistic antipatterns are considered poor practices in the naming, documentation, and choice of identifiers. In contrast, linguistic patterns represent best practices to APIs design. The linguistic patterns and their corresponding antipatterns are hence contrasting pairs. We propose the SARAv2 (Semantic Analysis of REST APIs version two) approach to perform syntactic and semantic analyses of REST APIs for IoT applications. Based on the SARAv2 approach, we develop the REST-Ling tool and empirically validate the detection results of nine linguistic antipatterns. We analyse 19 REST APIs for IoT applications. Our detection results show that the linguistic antipatterns are prevalent and the REST-Ling tool can detect linguistic patterns and antipatterns in REST APIs for IoT applications with an average accuracy of over 80%. Moreover, the tool performs the detection of linguistic antipatterns on average in the order of seconds, i.e., 8.396 seconds. We found that APIs generally follow good linguistic practices, although the prevalence of poor practices exists.
△ Less
Submitted 13 May, 2022;
originally announced May 2022.
-
Aggregation as Unsupervised Learning and its Evaluation
Authors:
Maria Ulan,
Welf Löwe,
Morgan Ericsson,
Anna Wingkvist
Abstract:
Regression uses supervised machine learning to find a model that combines several independent variables to predict a dependent variable based on ground truth (labeled) data, i.e., tuples of independent and dependent variables (labels). Similarly, aggregation also combines several independent variables to a dependent variable. The dependent variable should preserve properties of the independent var…
▽ More
Regression uses supervised machine learning to find a model that combines several independent variables to predict a dependent variable based on ground truth (labeled) data, i.e., tuples of independent and dependent variables (labels). Similarly, aggregation also combines several independent variables to a dependent variable. The dependent variable should preserve properties of the independent variables, e.g., the ranking or relative distance of the independent variable tuples, and/or represent a latent ground truth that is a function of these independent variables. However, ground truth data is not available for finding the aggregation model. Consequently, aggregation models are data agnostic or can only be derived with unsupervised machine learning approaches.
We introduce a novel unsupervised aggregation approach based on intrinsic properties of unlabeled training data, such as the cumulative probability distributions of the single independent variables and their mutual dependencies.
We present an empirical evaluation framework that allows assessing the proposed approach against other aggregation approaches from two perspectives: (i) how well the aggregation output represents properties of the input tuples, and (ii) how well can aggregated output predict a latent ground truth. To this end, we use data sets for assessing supervised regression approaches that contain explicit ground truth labels. However, the ground truth is not used for deriving the aggregation models, but it allows for the assessment from a perspective (ii). More specifically, we use regression data sets from the UCI machine learning repository and benchmark several data-agnostic and unsupervised approaches for aggregation against ours.
The benchmark results indicate that our approach outperforms the other data-agnostic and unsupervised aggregation approaches. It is almost on par with linear regression.
△ Less
Submitted 28 October, 2021;
originally announced October 2021.
-
To Automatically Map Source Code Entities to Architectural Modules with Naive Bayes
Authors:
Tobias Olsson,
Morgan Ericsson,
Anna Wingkvist
Abstract:
Background: The process of mapping a source code entity onto an architectural module is to a large degree a manual task. Automating this process could increase the use of static architecture conformance checking methods, such as reflexion modeling, in industry. Current techniques rely on user parameterization and a highly cohesive design. A machine learning approach would potentially require fewer…
▽ More
Background: The process of mapping a source code entity onto an architectural module is to a large degree a manual task. Automating this process could increase the use of static architecture conformance checking methods, such as reflexion modeling, in industry. Current techniques rely on user parameterization and a highly cohesive design. A machine learning approach would potentially require fewer parameters and better use of the available information to aid in automatic mapping. Aim: We investigate how a classifier can be trained to map from source code to architecture modules automatically. This classifier is trained with semantic and syntactic dependency information extracted from the source code and from architecture descriptions. The classifier is implemented using multinomial naive Bayes and evaluated. Method: We perform experiments and compare the classifier with three state-of-the-art mapping functions in eight open-source Java systems with known ground-truth-mappings. Results: We find that the classifier outperforms the state-of-the-art in all cases and that it provides a useful baseline for further research in the area of semi-automatic incremental clustering. Conclusions: We conclude that machine learning is a useful approach that performs better and with less need for parameterization compared to other approaches. Future work includes investigating problematic mappings and a more diverse set of subject systems.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.
-
Using Source Code Density to Improve the Accuracy of Automatic Commit Classification into Maintenance Activities
Authors:
Sebastian Hönel,
Morgan Ericsson,
Welf Löwe,
Anna Wingkvist
Abstract:
Source code is changed for a reason, e.g., to adapt, correct, or adapt it. This reason can provide valuable insight into the development process but is rarely explicitly documented when the change is committed to a source code repository. Automatic commit classification uses features extracted from commits to estimate this reason.
We introduce source code density, a measure of the net size of a…
▽ More
Source code is changed for a reason, e.g., to adapt, correct, or adapt it. This reason can provide valuable insight into the development process but is rarely explicitly documented when the change is committed to a source code repository. Automatic commit classification uses features extracted from commits to estimate this reason.
We introduce source code density, a measure of the net size of a commit, and show how it improves the accuracy of automatic commit classification compared to previous size-based classifications. We also investigate how preceding generations of commits affect the class of a commit, and whether taking the code density of previous commits into account can improve the accuracy further.
We achieve up to 89% accuracy and a Kappa of 0.82 for the cross-project commit classification where the model is trained on one project and applied to other projects. Models trained on single projects yield accuracies of up to 93% with a Kappa approaching 0.90. The accuracy of the automatic commit classification has a direct impact on software (process) quality analyses that exploit the classification, so our improvements to the accuracy will also improve the confidence in such analyses.
△ Less
Submitted 28 May, 2020;
originally announced May 2020.