-
In Specs we Trust? Conformance-Analysis of Implementation to Specifications in Node-RED and Associated Security Risks
Authors:
Simon Schneider,
Komal Kashish,
Katja Tuma,
Riccardo Scandariato
Abstract:
Low-code development frameworks for IoT platforms offer a simple drag-and-drop mechanism to create applications for the billions of existing IoT devices without the need for extensive programming knowledge. The security of such software is crucial given the close integration of IoT devices in many highly sensitive areas such as healthcare or home automation. Node-RED is such a framework, where app…
▽ More
Low-code development frameworks for IoT platforms offer a simple drag-and-drop mechanism to create applications for the billions of existing IoT devices without the need for extensive programming knowledge. The security of such software is crucial given the close integration of IoT devices in many highly sensitive areas such as healthcare or home automation. Node-RED is such a framework, where applications are built from nodes that are contributed by open-source developers. Its reliance on unvetted open-source contributions and lack of security checks raises the concern that the applications could be vulnerable to attacks, thereby imposing a security risk to end users. The low-code approach suggests, that many users could lack the technical knowledge to mitigate, understand, or even realize such security concerns. This paper focuses on "hidden" information flows in Node-RED nodes, meaning flows that are not captured by the specifications. They could (unknowingly or with malicious intent) cause leaks of sensitive information to unauthorized entities. We report the results of a conformance analysis of all nodes in the Node-RED framework, for which we compared the numbers of specified inputs and outputs of each node against the number of sources and sinks detected with CodeQL. The results show, that 55% of all nodes exhibit more possible flows than are specified. A risk assessment of a subset of the nodes showed, that 28% of them are associated with a high severity and 36% with a medium severity rating.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
A Systematic Literature Review on Automated Exploit and Security Test Generation
Authors:
Quang-Cuong Bui,
Emanuele Iannone,
Maria Camporese,
Torge Hinrichs,
Catherine Tony,
László Tóth,
Fabio Palomba,
Péter Hegedűs,
Fabio Massacci,
Riccardo Scandariato
Abstract:
The exploit or the Proof of Concept of the vulnerability plays an important role in developing superior vulnerability repair techniques, as it can be used as an oracle to verify the correctness of the patches generated by the tools. However, the vulnerability exploits are often unavailable and require time and expert knowledge to craft. Obtaining them from the exploit generation techniques is anot…
▽ More
The exploit or the Proof of Concept of the vulnerability plays an important role in developing superior vulnerability repair techniques, as it can be used as an oracle to verify the correctness of the patches generated by the tools. However, the vulnerability exploits are often unavailable and require time and expert knowledge to craft. Obtaining them from the exploit generation techniques is another potential solution. The goal of this survey is to aid the researchers and practitioners in understanding the existing techniques for exploit generation through the analysis of their characteristics and their usability in practice. We identify a list of exploit generation techniques from literature and group them into four categories: automated exploit generation, security testing, fuzzing, and other techniques. Most of the techniques focus on the memory-based vulnerabilities in C/C++ programs and web-based injection vulnerabilities in PHP and Java applications. We found only a few studies that publicly provided usable tools associated with their techniques.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
A Match Made in Heaven? Matching Test Cases and Vulnerabilities With the VUTECO Approach
Authors:
Emanuele Iannone,
Quang-Cuong Bui,
Riccardo Scandariato
Abstract:
Software vulnerabilities are commonly detected via static analysis, penetration testing, and fuzzing. They can also be found by running unit tests - so-called vulnerability-witnessing tests - that stimulate the security-sensitive behavior with crafted inputs. Developing such tests is difficult and time-consuming; thus, automated data-driven approaches could help developers intercept vulnerabilitie…
▽ More
Software vulnerabilities are commonly detected via static analysis, penetration testing, and fuzzing. They can also be found by running unit tests - so-called vulnerability-witnessing tests - that stimulate the security-sensitive behavior with crafted inputs. Developing such tests is difficult and time-consuming; thus, automated data-driven approaches could help developers intercept vulnerabilities earlier. However, training and validating such approaches require a lot of data, which is currently scarce. This paper introduces VUTECO, a deep learning-based approach for collecting instances of vulnerability-witnessing tests from Java repositories. VUTECO carries out two tasks: (1) the "Finding" task to determine whether a test case is security-related, and (2) the "Matching" task to relate a test case to the exact vulnerability it is witnessing. VUTECO successfully addresses the Finding task, achieving perfect precision and 0.83 F0.5 score on validated test cases in VUL4J and returning 102 out of 145 (70%) correct security-related test cases from 244 open-source Java projects. Despite showing sufficiently good performance for the Matching task - i.e., 0.86 precision and 0.68 F0.5 score - VUTECO failed to retrieve any valid match in the wild. Nevertheless, we observed that in almost all of the matches, the test case was still security-related despite being matched to the wrong vulnerability. In the end, VUTECO can help find vulnerability-witnessing tests, though the matching with the right vulnerability is yet to be solved; the findings obtained lay the stepping stone for future research on the matter.
△ Less
Submitted 5 February, 2025;
originally announced February 2025.
-
A Taxonomy of Functional Security Features and How They Can Be Located
Authors:
Kevin Hermann,
Simon Schneider,
Catherine Tony,
Asli Yardim,
Sven Peldszus,
Thorsten Berger,
Riccardo Scandariato,
M. Angela Sasse,
Alena Naiakshina
Abstract:
Security must be considered in almost every software system. Unfortunately, selecting and implementing security features remains challenging due to the variety of security threats and possible countermeasures. While security standards are intended to help developers, they are usually too abstract and vague to help implement security features, or they merely help configure such. A resource that des…
▽ More
Security must be considered in almost every software system. Unfortunately, selecting and implementing security features remains challenging due to the variety of security threats and possible countermeasures. While security standards are intended to help developers, they are usually too abstract and vague to help implement security features, or they merely help configure such. A resource that describes security features at an abstraction level between high-level (i.e., rather too general) and low-level (i.e., rather too specific) security standards could facilitate secure systems development. To realize security features, developers typically use external security frameworks, to minimize implementation mistakes. Even then, developers still make mistakes, often resulting in security vulnerabilities. When security incidents occur or the system needs to be audited or maintained, it is essential to know the implemented security features and, more importantly, where they are located. This task, commonly referred to as feature location, is often tedious and error-prone. Therefore, we have to support long-term tracking of implemented security features.
We present a study of security features in the literature and their coverage in popular security frameworks. We contribute (1) a taxonomy of 68 functional implementation-level security features including a mapping to widely used security standards, (2) an examination of 21 popular security frameworks concerning which of these security features they provide, and (3) a discussion on the representation of security features in source code. Our taxonomy aims to aid developers in selecting appropriate security features and frameworks and relating them to security standards when they need to choose and implement security features for a software system.
△ Less
Submitted 8 January, 2025;
originally announced January 2025.
-
The Good, the Bad, and the (Un)Usable: A Rapid Literature Review on Privacy as Code
Authors:
Nicolás E. Díaz Ferreyra,
Sirine Khelifi,
Nalin Arachchilage,
Riccardo Scandariato
Abstract:
Privacy and security are central to the design of information systems endowed with sound data protection and cyber resilience capabilities. Still, developers often struggle to incorporate these properties into software projects as they either lack proper cybersecurity training or do not consider them a priority. Prior work has tried to support privacy and security engineering activities through th…
▽ More
Privacy and security are central to the design of information systems endowed with sound data protection and cyber resilience capabilities. Still, developers often struggle to incorporate these properties into software projects as they either lack proper cybersecurity training or do not consider them a priority. Prior work has tried to support privacy and security engineering activities through threat modeling methods for scrutinizing flaws in system architectures. Moreover, several techniques for the automatic identification of vulnerabilities and the generation of secure code implementations have also been proposed in the current literature. Conversely, such as-code approaches seem under-investigated in the privacy domain, with little work elaborating on (i) the automatic detection of privacy properties in source code or (ii) the generation of privacy-friendly code. In this work, we seek to characterize the current research landscape of Privacy as Code (PaC) methods and tools by conducting a rapid literature review. Our results suggest that PaC research is in its infancy, especially regarding the performance evaluation and usability assessment of the existing approaches. Based on these findings, we outline and discuss prospective research directions concerning empirical studies with software practitioners, the curation of benchmark datasets, and the role of generative AI technologies.
△ Less
Submitted 2 March, 2025; v1 submitted 21 December, 2024;
originally announced December 2024.
-
Comparison of Static Analysis Architecture Recovery Tools for Microservice Applications
Authors:
Simon Schneider,
Alexander Bakhtin,
Xiaozhou Li,
Jacopo Soldani,
Antonio Brogi,
Tomas Cerny,
Riccardo Scandariato,
Davide Taibi
Abstract:
Architecture recovery tools help software engineers obtain an overview of the structure of their software systems during all phases of the software development life cycle. This is especially important for microservice applications because they consist of multiple interacting microservices, which makes it more challenging to oversee the architecture. Various tools and techniques for architecture re…
▽ More
Architecture recovery tools help software engineers obtain an overview of the structure of their software systems during all phases of the software development life cycle. This is especially important for microservice applications because they consist of multiple interacting microservices, which makes it more challenging to oversee the architecture. Various tools and techniques for architecture recovery (also called architecture reconstruction) have been presented in academic and gray literature sources, but no overview and comparison of their accuracy exists.
This paper presents the results of a multivocal literature review with the goal of identifying architecture recovery tools for microservice applications and a comparison of the identified tools' architectural recovery accuracy. We focused on static tools since they can be integrated into fast-paced CI/CD pipelines. 13 such tools were identified from the literature and nine of them could be executed and compared on their capability of detecting different system characteristics. The best-performing tool exhibited an overall F1-score of 0.86. Additionally, the possibility of combining multiple tools to increase the recovery correctness was investigated, yielding a combination of four individual tools that achieves an F1-score of 0.91.
Registered report: The methodology of this study has been peer-reviewed and accepted as a registered report at MSR'24: arXiv:2403.06941
△ Less
Submitted 11 December, 2024;
originally announced December 2024.
-
Designing Secure AI-based Systems: a Multi-Vocal Literature Review
Authors:
Simon Schneider,
Ananya Saha,
Emanuele Mezzi,
Katja Tuma,
Riccardo Scandariato
Abstract:
AI-based systems leverage recent advances in the field of AI/ML by combining traditional software systems with AI components. Applications are increasingly being developed in this way. Software engineers can usually rely on a plethora of supporting information on how to use and implement any given technology. For AI-based systems, however, such information is scarce. Specifically, guidance on how…
▽ More
AI-based systems leverage recent advances in the field of AI/ML by combining traditional software systems with AI components. Applications are increasingly being developed in this way. Software engineers can usually rely on a plethora of supporting information on how to use and implement any given technology. For AI-based systems, however, such information is scarce. Specifically, guidance on how to securely design the architecture is not available to the extent as for other systems. We present 16 architectural security guidelines for the design of AI-based systems that were curated via a multi-vocal literature review. The guidelines could support practitioners with actionable advice on the secure development of AI-based systems. Further, we mapped the guidelines to typical components of AI-based systems and observed a high coverage where 6 out of 8 generic components have at least one guideline associated to them.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
Prompting Techniques for Secure Code Generation: A Systematic Investigation
Authors:
Catherine Tony,
Nicolás E. Díaz Ferreyra,
Markus Mutas,
Salem Dhiff,
Riccardo Scandariato
Abstract:
Large Language Models (LLMs) are gaining momentum in software development with prompt-driven programming enabling developers to create code from natural language (NL) instructions. However, studies have questioned their ability to produce secure code and, thereby, the quality of prompt-generated software. Alongside, various prompting techniques that carefully tailor prompts have emerged to elicit…
▽ More
Large Language Models (LLMs) are gaining momentum in software development with prompt-driven programming enabling developers to create code from natural language (NL) instructions. However, studies have questioned their ability to produce secure code and, thereby, the quality of prompt-generated software. Alongside, various prompting techniques that carefully tailor prompts have emerged to elicit optimal responses from LLMs. Still, the interplay between such prompting strategies and secure code generation remains under-explored and calls for further investigations. OBJECTIVE: In this study, we investigate the impact of different prompting techniques on the security of code generated from NL instructions by LLMs. METHOD: First we perform a systematic literature review to identify the existing prompting techniques that can be used for code generation tasks. A subset of these techniques are evaluated on GPT-3, GPT-3.5, and GPT-4 models for secure code generation. For this, we used an existing dataset consisting of 150 NL security-relevant code-generation prompts. RESULTS: Our work (i) classifies potential prompting techniques for code generation (ii) adapts and evaluates a subset of the identified techniques for secure code generation tasks and (iii) observes a reduction in security weaknesses across the tested LLMs, especially after using an existing technique called Recursive Criticism and Improvement (RCI), contributing valuable insights to the ongoing discourse on LLM-generated code security.
△ Less
Submitted 26 February, 2025; v1 submitted 9 July, 2024;
originally announced July 2024.
-
Managing Security Evidence in Safety-Critical Organizations
Authors:
Mazen Mohamad,
Jan-Philipp Steghöfer,
Eric Knauss,
Riccardo Scandariato
Abstract:
With the increasing prevalence of open and connected products, cybersecurity has become a serious issue in safety-critical domains such as the automotive industry. As a result, regulatory bodies have become more stringent in their requirements for cybersecurity, necessitating security assurance for products developed in these domains. In response, companies have implemented new or modified process…
▽ More
With the increasing prevalence of open and connected products, cybersecurity has become a serious issue in safety-critical domains such as the automotive industry. As a result, regulatory bodies have become more stringent in their requirements for cybersecurity, necessitating security assurance for products developed in these domains. In response, companies have implemented new or modified processes to incorporate security into their product development lifecycle, resulting in a large amount of evidence being created to support claims about the achievement of a certain level of security. However, managing evidence is not a trivial task, particularly for complex products and systems. This paper presents a qualitative interview study conducted in six companies on the maturity of managing security evidence in safety-critical organizations. We find that the current maturity of managing security evidence is insufficient for the increasing requirements set by certification authorities and standardization bodies. Organisations currently fail to identify relevant artifacts as security evidence and manage this evidence on an organizational level. One part of the reason are educational gaps, the other a lack of processes. The impact of AI on the management of security evidence is still an open question
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Comparison of Static Analysis Architecture Recovery Tools for Microservice Applications
Authors:
Simon Schneider,
Alexander Bakhtin,
Xiaozhou Li,
Jacopo Soldani,
Antonio Brogi,
Tomas Cerny,
Riccardo Scandariato,
Davide Taibi
Abstract:
Architecture recovery tools help software engineers obtain an overview of their software systems during all phases of the software development lifecycle. This is especially important for microservice applications because their distributed nature makes it more challenging to oversee the architecture. Various tools and techniques for this task are presented in academic and grey literature sources. P…
▽ More
Architecture recovery tools help software engineers obtain an overview of their software systems during all phases of the software development lifecycle. This is especially important for microservice applications because their distributed nature makes it more challenging to oversee the architecture. Various tools and techniques for this task are presented in academic and grey literature sources. Practitioners and researchers can benefit from a comprehensive overview of these tools and their abilities. However, no such overview exists that is based on executing the identified tools and assessing their outputs regarding effectiveness. With the study described in this paper, we plan to first identify static analysis architecture recovery tools for microservice applications via a multi-vocal literature review, and then execute them on a common dataset and compare the measured effectiveness in architecture recovery. We will focus on static approaches because they are also suitable for integration into fast-paced CI/CD pipelines.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
What Can Self-Admitted Technical Debt Tell Us About Security? A Mixed-Methods Study
Authors:
Nicolás E. Díaz Ferreyra,
Mojtaba Shahin,
Mansooreh Zahedi,
Sodiq Quadri,
Ricardo Scandariato
Abstract:
Self-Admitted Technical Debt (SATD) encompasses a wide array of sub-optimal design and implementation choices reported in software artefacts (e.g., code comments and commit messages) by developers themselves. Such reports have been central to the study of software maintenance and evolution over the last decades. However, they can also be deemed as dreadful sources of information on potentially exp…
▽ More
Self-Admitted Technical Debt (SATD) encompasses a wide array of sub-optimal design and implementation choices reported in software artefacts (e.g., code comments and commit messages) by developers themselves. Such reports have been central to the study of software maintenance and evolution over the last decades. However, they can also be deemed as dreadful sources of information on potentially exploitable vulnerabilities and security flaws. This work investigates the security implications of SATD from a technical and developer-centred perspective. On the one hand, it analyses whether security pointers disclosed inside SATD sources can be used to characterise vulnerabilities in Open-Source Software (OSS) projects and repositories. On the other hand, it delves into developers' perspectives regarding the motivations behind this practice, its prevalence, and its potential negative consequences. We followed a mixed-methods approach consisting of (i) the analysis of a preexisting dataset containing 8,812 SATD instances and (ii) an online survey with 222 OSS practitioners. We gathered 201 SATD instances through the dataset analysis and mapped them to different Common Weakness Enumeration (CWE) identifiers. Overall, 25 different types of CWEs were spotted across commit messages, pull requests, code comments, and issue sections, from which 8 appear among MITRE's Top-25 most dangerous ones. The survey shows that software practitioners often place security pointers across SATD artefacts to promote a security culture among their peers and help them spot flaky code sections, among other motives. However, they also consider such a practice risky as it may facilitate vulnerability exploits. Our findings suggest that preserving the contextual integrity of security pointers disseminated across SATD artefacts is critical to safeguard both commercial and OSS solutions against zero-day attacks.
△ Less
Submitted 2 March, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
CATMA: Conformance Analysis Tool For Microservice Applications
Authors:
Clinton Cao,
Simon Schneider,
Nicolás E. Díaz Ferreyra,
Sicco Verwer,
Annibale Panichella,
Riccardo Scandariato
Abstract:
The microservice architecture allows developers to divide the core functionality of their software system into multiple smaller services. However, this architectural style also makes it harder for them to debug and assess whether the system's deployment conforms to its implementation. We present CATMA, an automated tool that detects non-conformances between the system's deployment and implementati…
▽ More
The microservice architecture allows developers to divide the core functionality of their software system into multiple smaller services. However, this architectural style also makes it harder for them to debug and assess whether the system's deployment conforms to its implementation. We present CATMA, an automated tool that detects non-conformances between the system's deployment and implementation. It automatically visualizes and generates potential interpretations for the detected discrepancies. Our evaluation of CATMA shows promising results in terms of performance and providing useful insights. CATMA is available at \url{https://cyber-analytics.nl/catma.github.io/}, and a demonstration video is available at \url{https://youtu.be/WKP1hG-TDKc}.
△ Less
Submitted 23 January, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
How Dataflow Diagrams Impact Software Security Analysis: an Empirical Experiment
Authors:
Simon Schneider,
Nicolás E. Díaz Ferreyra,
Pierre-Jean Quéval,
Georg Simhandl,
Uwe Zdun,
Riccardo Scandariato
Abstract:
Models of software systems are used throughout the software development lifecycle. Dataflow diagrams (DFDs), in particular, are well-established resources for security analysis. Many techniques, such as threat modelling, are based on DFDs of the analysed application. However, their impact on the performance of analysts in a security analysis setting has not been explored before. In this paper, we…
▽ More
Models of software systems are used throughout the software development lifecycle. Dataflow diagrams (DFDs), in particular, are well-established resources for security analysis. Many techniques, such as threat modelling, are based on DFDs of the analysed application. However, their impact on the performance of analysts in a security analysis setting has not been explored before. In this paper, we present the findings of an empirical experiment conducted to investigate this effect. Following a within-groups design, participants were asked to solve security-relevant tasks for a given microservice application. In the control condition, the participants had to examine the source code manually. In the model-supported condition, they were additionally provided a DFD of the analysed application and traceability information linking model items to artefacts in source code. We found that the participants (n = 24) performed significantly better in answering the analysis tasks correctly in the model-supported condition (41% increase in analysis correctness). Further, participants who reported using the provided traceability information performed better in giving evidence for their answers (315% increase in correctness of evidence). Finally, we identified three open challenges of using DFDs for security analysis based on the insights gained in the experiment.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Automatic Extraction of Security-Rich Dataflow Diagrams for Microservice Applications written in Java
Authors:
Simon Schneider,
Riccardo Scandariato
Abstract:
Dataflow diagrams (DFDs) are a valuable asset for securing applications, as they are the starting point for many security assessment techniques. Their creation, however, is often done manually, which is time-consuming and introduces problems concerning their correctness. Furthermore, as applications are continuously extended and modified in CI/CD pipelines, the DFDs need to be kept in sync, which…
▽ More
Dataflow diagrams (DFDs) are a valuable asset for securing applications, as they are the starting point for many security assessment techniques. Their creation, however, is often done manually, which is time-consuming and introduces problems concerning their correctness. Furthermore, as applications are continuously extended and modified in CI/CD pipelines, the DFDs need to be kept in sync, which is also challenging. In this paper, we present a novel, tool-supported technique to automatically extract DFDs from the implementation code of microservices. The technique parses source code and configuration files in search for keywords that are used as evidence for the model extraction. Our approach uses a novel technique that iteratively detects new keywords, thereby snowballing through an application's codebase. Coupled with other detection techniques, it produces a fully-fledged DFD enriched with security-relevant annotations. The extracted DFDs further provide full traceability between model items and code snippets. We evaluate our approach and the accompanying prototype for applications written in Java on a manually curated dataset of 17 open-source applications. In our testing set of applications, we observe an overall precision of 93% and recall of 85%.
△ Less
Submitted 25 April, 2023;
originally announced April 2023.
-
LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations
Authors:
Catherine Tony,
Markus Mutas,
Nicolás E. Díaz Ferreyra,
Riccardo Scandariato
Abstract:
Large Language Models (LLMs) like Codex are powerful tools for performing code completion and code generation tasks as they are trained on billions of lines of code from publicly available sources. Moreover, these models are capable of generating code snippets from Natural Language (NL) descriptions by learning languages and programming practices from public GitHub repositories. Although LLMs prom…
▽ More
Large Language Models (LLMs) like Codex are powerful tools for performing code completion and code generation tasks as they are trained on billions of lines of code from publicly available sources. Moreover, these models are capable of generating code snippets from Natural Language (NL) descriptions by learning languages and programming practices from public GitHub repositories. Although LLMs promise an effortless NL-driven deployment of software applications, the security of the code they generate has not been extensively investigated nor documented. In this work, we present LLMSecEval, a dataset containing 150 NL prompts that can be leveraged for assessing the security performance of such models. Such prompts are NL descriptions of code snippets prone to various security vulnerabilities listed in MITRE's Top 25 Common Weakness Enumeration (CWE) ranking. Each prompt in our dataset comes with a secure implementation example to facilitate comparative evaluations against code produced by LLMs. As a practical application, we show how LLMSecEval can be used for evaluating the security of snippets automatically generated from NL descriptions.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
Regret, Delete, (Do Not) Repeat: An Analysis of Self-Cleaning Practices on Twitter After the Outbreak of the COVID-19 Pandemic
Authors:
Nicolás E. Díaz Ferreyra,
Gautam Kishore Shahi,
Catherine Tony,
Stefan Stieglitz,
Riccardo Scandariato
Abstract:
During the outbreak of the COVID-19 pandemic, many people shared their symptoms across Online Social Networks (OSNs) like Twitter, hoping for others' advice or moral support. Prior studies have shown that those who disclose health-related information across OSNs often tend to regret it and delete their publications afterwards. Hence, deleted posts containing sensitive data can be seen as manifesta…
▽ More
During the outbreak of the COVID-19 pandemic, many people shared their symptoms across Online Social Networks (OSNs) like Twitter, hoping for others' advice or moral support. Prior studies have shown that those who disclose health-related information across OSNs often tend to regret it and delete their publications afterwards. Hence, deleted posts containing sensitive data can be seen as manifestations of online regrets. In this work, we present an analysis of deleted content on Twitter during the outbreak of the COVID-19 pandemic. For this, we collected more than 3.67 million tweets describing COVID-19 symptoms (e.g., fever, cough, and fatigue) posted between January and April 2020. We observed that around 24% of the tweets containing personal pronouns were deleted either by their authors or by the platform after one year. As a practical application of the resulting dataset, we explored its suitability for the automatic classification of regrettable content on Twitter.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
Developers Need Protection, Too: Perspectives and Research Challenges for Privacy in Social Coding Platforms
Authors:
Nicolás E. Díaz Ferreyra,
Abdessamad Imine,
Melina Vidoni,
Riccardo Scandariato
Abstract:
Social Coding Platforms (SCPs) like GitHub have become central to modern software engineering thanks to their collaborative and version-control features. Like in mainstream Online Social Networks (OSNs) such as Facebook, users of SCPs are subjected to privacy attacks and threats given the high amounts of personal and project-related data available in their profiles and software repositories. Howev…
▽ More
Social Coding Platforms (SCPs) like GitHub have become central to modern software engineering thanks to their collaborative and version-control features. Like in mainstream Online Social Networks (OSNs) such as Facebook, users of SCPs are subjected to privacy attacks and threats given the high amounts of personal and project-related data available in their profiles and software repositories. However, unlike in OSNs, the privacy concerns and practices of SCP users have not been extensively explored nor documented in the current literature. In this work, we present the preliminary results of an online survey (N=105) addressing developers' concerns and perceptions about privacy threats steaming from SCPs. Our results suggest that, although users express concern about social and organisational privacy threats, they often feel safe sharing personal and project-related information on these platforms. Moreover, attacks targeting the inference of sensitive attributes are considered more likely than those seeking to re-identify source-code contributors. Based on these findings, we propose a set of recommendations for future investigations addressing privacy and identity management in SCPs.
△ Less
Submitted 3 March, 2023;
originally announced March 2023.
-
GitHub Considered Harmful? Analyzing Open-Source Projects for the Automatic Generation of Cryptographic API Call Sequences
Authors:
Catherine Tony,
Nicolás E. Díaz Ferreyra,
Riccardo Scandariato
Abstract:
GitHub is a popular data repository for code examples. It is being continuously used to train several AI-based tools to automatically generate code. However, the effectiveness of such tools in correctly demonstrating the usage of cryptographic APIs has not been thoroughly assessed. In this paper, we investigate the extent and severity of misuses, specifically caused by incorrect cryptographic API…
▽ More
GitHub is a popular data repository for code examples. It is being continuously used to train several AI-based tools to automatically generate code. However, the effectiveness of such tools in correctly demonstrating the usage of cryptographic APIs has not been thoroughly assessed. In this paper, we investigate the extent and severity of misuses, specifically caused by incorrect cryptographic API call sequences in GitHub. We also analyze the suitability of GitHub data to train a learning-based model to generate correct cryptographic API call sequences. For this, we manually extracted and analyzed the call sequences from GitHub. Using this data, we augmented an existing learning-based model called DeepAPI to create two security-specific models that generate cryptographic API call sequences for a given natural language (NL) description. Our results indicate that it is imperative to not neglect the misuses in API call sequences while using data sources like GitHub, to train models that generate code.
△ Less
Submitted 24 November, 2022;
originally announced November 2022.
-
Cybersecurity Discussions in Stack Overflow: A Developer-Centred Analysis of Engagement and Self-Disclosure Behaviour
Authors:
Nicolás E. Díaz Ferreyra,
Melina Vidoni,
Maritta Heisel,
Riccardo Scandariato
Abstract:
Stack Overflow (SO) is a popular platform among developers seeking advice on various software-related topics, including privacy and security. As for many knowledge-sharing websites, the value of SO depends largely on users' engagement, namely their willingness to answer, comment or post technical questions. Still, many of these questions (including cybersecurity-related ones) remain unanswered, pu…
▽ More
Stack Overflow (SO) is a popular platform among developers seeking advice on various software-related topics, including privacy and security. As for many knowledge-sharing websites, the value of SO depends largely on users' engagement, namely their willingness to answer, comment or post technical questions. Still, many of these questions (including cybersecurity-related ones) remain unanswered, putting the site's relevance and reputation into question. Hence, it is important to understand users' participation in privacy and security discussions to promote engagement and foster the exchange of such expertise. Objective: Based on prior findings on online social networks, this work elaborates on the interplay between users' engagement and their privacy practices in SO. Particularly, it analyses developers' self-disclosure behaviour regarding profile visibility and their involvement in discussions related to privacy and security. Method: We followed a mixed-methods approach by (i) analysing SO data from 1239 cybersecurity-tagged questions along with 7048 user profiles, and (ii) conducting an anonymous online survey (N=64). Results: About 33% of the questions we retrieved had no answer, whereas more than 50% had no accepted answer. We observed that "proactive" users tend to disclose significantly less information in their profiles than "reactive" and "unengaged" ones. However, no correlations were found between these engagement categories and privacy-related constructs such as Perceived Control or General Privacy Concerns. Implications: These findings contribute to (i) a better understanding of developers' engagement towards privacy and security topics, and (ii) to shape strategies promoting the exchange of cybersecurity expertise in SO.
△ Less
Submitted 4 July, 2022;
originally announced July 2022.
-
Conversational DevBots for Secure Programming: An Empirical Study on SKF Chatbot
Authors:
Catherine Tony,
Mohana Balasubramanian,
Nicolás E. Díaz Ferreyra,
Riccardo Scandariato
Abstract:
Conversational agents or chatbots are widely investigated and used across different fields including healthcare, education, and marketing. Still, the development of chatbots for assisting secure coding practices is in its infancy. In this paper, we present the results of an empirical study on SKF chatbot, a software-development bot (DevBot) designed to answer queries about software security. To th…
▽ More
Conversational agents or chatbots are widely investigated and used across different fields including healthcare, education, and marketing. Still, the development of chatbots for assisting secure coding practices is in its infancy. In this paper, we present the results of an empirical study on SKF chatbot, a software-development bot (DevBot) designed to answer queries about software security. To the best of our knowledge, SKF chatbot is one of the very few of its kind, thus a representative instance of conversational DevBots aiding secure software development. In this study, we collect and analyse empirical evidence on the effectiveness of SKF chatbot, while assessing the needs and expectations of its users (i.e., software developers). Furthermore, we explore the factors that may hinder the elaboration of more sophisticated conversational security DevBots and identify features for improving the efficiency of state-of-the-art solutions. All in all, our findings provide valuable insights pointing towards the design of more context-aware and personalized conversational DevBots for security engineering.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
SoK: Security of Microservice Applications: A Practitioners' Perspective on Challenges and Best Practices
Authors:
Priyanka Billawa,
Anusha Bambhore Tukaram,
Nicolás E. Díaz Ferreyra,
Jan-Philipp Steghöfer,
Riccardo Scandariato,
Georg Simhandl
Abstract:
Cloud-based application deployment is becoming increasingly popular among businesses, thanks to the emergence of microservices. However, securing such architectures is a challenging task since traditional security concepts cannot be directly applied to microservice architectures due to their distributed nature. The situation is exacerbated by the scattered nature of guidelines and best practices a…
▽ More
Cloud-based application deployment is becoming increasingly popular among businesses, thanks to the emergence of microservices. However, securing such architectures is a challenging task since traditional security concepts cannot be directly applied to microservice architectures due to their distributed nature. The situation is exacerbated by the scattered nature of guidelines and best practices advocated by practitioners and organizations in this field. This research paper we aim to shay light over the current microservice security discussions hidden within Grey Literature (GL) sources. Particularly, we identify the challenges that arise when securing microservice architectures, as well as solutions recommended by practitioners to address these issues. For this, we conducted a systematic GL study on the challenges and best practices of microservice security present in the Internet with the goal of capturing relevant discussions in blogs, white papers, and standards. We collected 312 GL sources from which 57 were rigorously classified and analyzed. This analysis on the one hand validated past academic literature studies in the area of microservice security, but it also identified improvements to existing methodologies pointing towards future research directions.
△ Less
Submitted 2 September, 2022; v1 submitted 3 February, 2022;
originally announced February 2022.
-
Checking Security Compliance between Models and Code
Authors:
Katja Tuma,
Sven Peldszus,
Daniel Strüber,
Riccardo Scandariato,
Jan Jürjens
Abstract:
It is challenging to verify that the planned security mechanisms are actually implemented in the software. In the context of model-based development, the implemented security mechanisms must capture all intended security properties that were considered in the design models. Assuring this compliance manually is labor intensive and can be error-prone. This work introduces the first semi-automatic te…
▽ More
It is challenging to verify that the planned security mechanisms are actually implemented in the software. In the context of model-based development, the implemented security mechanisms must capture all intended security properties that were considered in the design models. Assuring this compliance manually is labor intensive and can be error-prone. This work introduces the first semi-automatic technique for secure data flow compliance checks between design models and code. We develop heuristic-based automated mappings between a design-level model (SecDFD, provided by humans) and a code-level representation (Program Model, automatically extracted from the implementation) in order to guide users in discovering compliance violations, and hence potential security flaws in the code. These mappings enable an automated, and project-specific static analysis of the implementation with respect to the desired security properties of the design model. We developed two types of security compliance checks and evaluated the entire approach on open source Java projects.
△ Less
Submitted 18 March, 2022; v1 submitted 19 August, 2021;
originally announced August 2021.
-
Secure Software Development in the Era of Fluid Multi-party Open Software and Services
Authors:
Ivan Pashchenko,
Riccardo Scandariato,
Antonino Sabetta,
Fabio Massacci
Abstract:
Pushed by market forces, software development has become fast-paced. As a consequence, modern development projects are assembled from 3rd-party components. Security & privacy assurance techniques once designed for large, controlled updates over months or years, must now cope with small, continuous changes taking place within a week, and happening in sub-components that are controlled by third-part…
▽ More
Pushed by market forces, software development has become fast-paced. As a consequence, modern development projects are assembled from 3rd-party components. Security & privacy assurance techniques once designed for large, controlled updates over months or years, must now cope with small, continuous changes taking place within a week, and happening in sub-components that are controlled by third-party developers one might not even know they existed. In this paper, we aim to provide an overview of the current software security approaches and evaluate their appropriateness in the face of the changed nature in software development. Software security assurance could benefit by switching from a process-based to an artefact-based approach. Further, security evaluation might need to be more incremental, automated and decentralized. We believe this can be achieved by supporting mechanisms for lightweight and scalable screenings that are applicable to the entire population of software components albeit there might be a price to pay.
△ Less
Submitted 4 March, 2021;
originally announced March 2021.
-
Contextualisation of Data Flow Diagrams for security analysis
Authors:
Shamal Faily,
Riccardo Scandariato,
Adam Shostack,
Laurens Sion,
Duncan Ki-Aries
Abstract:
Data flow diagrams (DFDs) are popular for sketching systems for subsequent threat modelling. Their limited semantics make reasoning about them difficult, but enriching them endangers their simplicity and subsequent ease of take up. We present an approach for reasoning about tainted data flows in design-level DFDs by putting them in context with other complementary usability and requirements models…
▽ More
Data flow diagrams (DFDs) are popular for sketching systems for subsequent threat modelling. Their limited semantics make reasoning about them difficult, but enriching them endangers their simplicity and subsequent ease of take up. We present an approach for reasoning about tainted data flows in design-level DFDs by putting them in context with other complementary usability and requirements models. We illustrate our approach using a pilot study, where tainted data flows were identified without any augmentations to either the DFD or its complementary models.
△ Less
Submitted 7 June, 2020;
originally announced June 2020.
-
Security Assurance Cases -- State of the Art of an Emerging Approach
Authors:
Mazen Mohamad,
Jan-Philipp Steghöfer,
Riccardo Scandariato
Abstract:
Security Assurance Cases (SAC) are a form of structured argumentation used to reason about the security properties of a system. After the successful adoption of assurance cases for safety, SACs are getting significant traction in recent years, especially in safety-critical industries (e.g., automotive), where there is an increasing pressure to be compliant with several security standards and regul…
▽ More
Security Assurance Cases (SAC) are a form of structured argumentation used to reason about the security properties of a system. After the successful adoption of assurance cases for safety, SACs are getting significant traction in recent years, especially in safety-critical industries (e.g., automotive), where there is an increasing pressure to be compliant with several security standards and regulations. Accordingly, research in the field of SAC has flourished in the past decade, with different approaches being investigated. In an effort to systematize this active field of research, we conducted a systematic literature review (SLR) of the existing academic studies on SAC. Our review resulted in an in-depth analysis and comparison of 51 papers. Our results indicate that, while there are numerous papers discussing the importance of security assurance cases and their usage scenarios, the literature is still immature with respect to concrete support for practitioners on how to build and maintain a SAC. More importantly, even though some methodologies are available, their validation and tool support is still lacking.
△ Less
Submitted 31 March, 2020;
originally announced March 2020.
-
Cross-project Classification of Security-related Requirements
Authors:
Mazen Mohamad,
Jan-Philipp Steghöfer,
Riccardo Scandariato
Abstract:
We investigate the feasibility of using a classifier for security-related requirements trained on requirement specifications available online. This is helpful in case different requirement types are not differentiated in a large existing requirement specification. Our work is motivated by the need to identify security requirements for the creation of security assurance cases that become a necessit…
▽ More
We investigate the feasibility of using a classifier for security-related requirements trained on requirement specifications available online. This is helpful in case different requirement types are not differentiated in a large existing requirement specification. Our work is motivated by the need to identify security requirements for the creation of security assurance cases that become a necessity for many organizations with new and upcoming standards like GDPR and HiPAA. We base our investigation on ten requirement specifications, randomly selected from a Google Search and partially pre-labeled. To validate the model, we run 10-fold cross-validation on the data where each specification constitutes a group. Our results indicate the feasibility of training a model from a heterogeneous data set including specifications from multiple domains and in different styles. However, performance benefits from revising the pre-labeled data for consistency. Additionally, we show that classifiers trained only on a specific specification type fare worse and that the way requirements are written has no impact on classifier accuracy.
△ Less
Submitted 31 March, 2020;
originally announced March 2020.
-
Security Assurance Cases for Road Vehicles: an Industry Perspective
Authors:
Mazen Mohamad,
Alexander Åström,
Örjan Askerdal,
Jörgen Borg,
Riccardo Scandariato
Abstract:
Assurance cases are structured arguments that are commonly used to reason about the safety of a product or service. Currently, there is an ongoing push towards using assurance cases for also cybersecurity, especially in safety-critical domains, like automotive. While the industry is faced with the challenge of defining a sound methodology to build security assurance cases, the state of the art is…
▽ More
Assurance cases are structured arguments that are commonly used to reason about the safety of a product or service. Currently, there is an ongoing push towards using assurance cases for also cybersecurity, especially in safety-critical domains, like automotive. While the industry is faced with the challenge of defining a sound methodology to build security assurance cases, the state of the art is rather immature. Therefore, we have conducted a thorough investigation of the (external) constraints and (internal) needs that security assurance cases have to satisfy in the context of the automotive industry. This has been done in the context of two large automotive companies in Sweden. The end result is a set of recommendations that automotive companies can apply in order to define security assurance cases that are (i) aligned with the constraints imposed by the existing and upcoming standards and regulations and (ii)harmonized with the internal product development processes and organizational practices. We expect the results to be also of interest for product companies in other safety-critical domains, like healthcare, transportation, and so on
△ Less
Submitted 31 March, 2020;
originally announced March 2020.
-
Perception and Acceptance of an Autonomous Refactoring Bot
Authors:
Marvin Wyrich,
Regina Hebig,
Stefan Wagner,
Riccardo Scandariato
Abstract:
The use of autonomous bots for automatic support in software development tasks is increasing. In the past, however, they were not always perceived positively and sometimes experienced a negative bias compared to their human counterparts. We conducted a qualitative study in which we deployed an autonomous refactoring bot for 41 days in a student software development project. In between and at the e…
▽ More
The use of autonomous bots for automatic support in software development tasks is increasing. In the past, however, they were not always perceived positively and sometimes experienced a negative bias compared to their human counterparts. We conducted a qualitative study in which we deployed an autonomous refactoring bot for 41 days in a student software development project. In between and at the end, we conducted semi-structured interviews to find out how developers perceive the bot and whether they are more or less critical when reviewing the contributions of a bot compared to human contributions. Our findings show that the bot was perceived as a useful and unobtrusive contributor, and developers were no more critical of it than they were about their human colleagues, but only a few team members felt responsible for the bot.
△ Less
Submitted 8 January, 2020;
originally announced January 2020.
-
Finding Security Threats That Matter: An Industrial Case Study
Authors:
Katja Tuma,
Christian Sandberg,
Urban Thorsson,
Mathias Widman,
Riccardo Scandariato
Abstract:
Recent trends in the software engineering (i.e., Agile, DevOps) have shortened the development life-cycle limiting resources spent on security analysis of software designs. In this context, architecture models are (often manually) analyzed for potential security threats. Risk-last threat analysis suggests identifying all security threats before prioritizing them. In contrast, risk-first threat ana…
▽ More
Recent trends in the software engineering (i.e., Agile, DevOps) have shortened the development life-cycle limiting resources spent on security analysis of software designs. In this context, architecture models are (often manually) analyzed for potential security threats. Risk-last threat analysis suggests identifying all security threats before prioritizing them. In contrast, risk-first threat analysis suggests identifying the risks before the threats, by-passing threat prioritization. This seems promising for organizations where developing speed is of great importance. Yet, little empirical evidence exists about the effect of sacrificing systematicity for high-priority threats on the performance and execution of threat analysis. To this aim, we conduct a case study with industrial experts from the automotive domain, where we empirically compare a risk-first technique to a risk-last technique. In this study, we consciously trade the amount of participants for a more realistic simulation of threat analysis sessions in practice. This allows us to closely observe industrial experts and gain deep insights into the industrial practice. This work contributes with: (i) a quantitative comparison of performance, (ii) a quantitative and qualitative comparison of execution, and (iii) a comparative discussion of the two techniques. We find no differences in the productivity and timeliness of discovering high-priority security threats. Yet, we find differences in analysis execution. In particular, participants using the risk-first technique found twice as many high-priority threats, developed detailed attack scenarios, and discussed threat feasibility in detail. On the other hand, participants using the risk-last technique found more medium and low-priority threats and finished early.
△ Less
Submitted 8 October, 2019;
originally announced October 2019.
-
Inspection Guidelines to Identify Security Design Flaws
Authors:
Katja Tuma,
Danial Hosseini,
Kyriakos Malamas,
Riccardo Scandariato
Abstract:
Recent trends in the software development practices (Agile, DevOps, CI) have shortened the development life-cycle causing the need for efficient security-by-design approaches. In this context, software architectures are analyzed for potential vulnerabilities and design flaws. Yet, design flaws are often documented with natural language and require a manual analysis, which is inefficient. Besides l…
▽ More
Recent trends in the software development practices (Agile, DevOps, CI) have shortened the development life-cycle causing the need for efficient security-by-design approaches. In this context, software architectures are analyzed for potential vulnerabilities and design flaws. Yet, design flaws are often documented with natural language and require a manual analysis, which is inefficient. Besides low-level vulnerability databases (e.g., CWE, CAPEC) there is little systematized knowledge on security design flaws. The purpose of this work is to provide a catalog of security design flaws and to empirically evaluate the inspection guidelines for detecting security design flaws. To this aim, we present a catalog of 19 security design flaws and conduct empirical studies with master and doctoral students. This paper contributes with: (i) a catalog of security design flaws, (ii) an empirical evaluation of the inspection guidelines with master students, and (iii) a replicated evaluation with doctoral students. We also account for the shortcomings of the inspection guidelines and make suggestions for their improvement with respect to the generalization of guidelines, catalog re-organization, and format of documentation. We record similar precision, recall, and productivity in both empirical studies and discuss the potential for automating the security design flaw detection.
△ Less
Submitted 5 June, 2019;
originally announced June 2019.