Search | arXiv e-print repository

Capturing the Effects of Quantization on Trojans in Code LLMs

Authors: Aftab Hussain, Sadegh AlMahdi Kazemi Zarkouei, Md Rafiqul Islam Rabin, Mohammad Amin Alipour, Sen Lin, Bowen Xu

Abstract: Large language models of code exhibit high capability in performing diverse software engineering tasks, such as code translation, defect detection, text-to-code generation, and code summarization. While their ability to enhance developer productivity has spurred widespread use, these models have also seen substantial growth in size, often reaching billions of parameters. This scale demands efficie… ▽ More Large language models of code exhibit high capability in performing diverse software engineering tasks, such as code translation, defect detection, text-to-code generation, and code summarization. While their ability to enhance developer productivity has spurred widespread use, these models have also seen substantial growth in size, often reaching billions of parameters. This scale demands efficient memory resource usage, prompting practitioners to use optimization techniques such as model quantization. Quantization uses smaller bit representations for the model parameters, reducing the precision of the weights. In this work, we investigate the impact of quantization on the risk of data poisoning attacks on these models, specifically examining whether it mitigates or exacerbates such vulnerabilities. We focus on two large language models, Meta's Llama-2-7b and CodeLlama-7b, applied to an SQL code generation task. Additionally, we introduce a new metric for measuring trojan signals in compromised models. We find that quantization has differing effects on code-generating LLMs: while reducing precision does not significantly alter Llama-2's behavior, it boosts performance and reduces attack success rates in CodeLlama, particularly at 4-bit precision. △ Less

Submitted 20 May, 2025; originally announced May 2025.

arXiv:2504.18691 [pdf, other]

From Prompts to Propositions: A Logic-Based Lens on Student-LLM Interactions

Authors: Ali Alfageeh, Sadegh AlMahdi Kazemi Zarkouei, Daye Nam, Daniel Prol, Matin Amoozadeh, Souti Chattopadhyay, James Prather, Paul Denny, Juho Leinonen, Michael Hilton, Sruti Srinivasa Ragavan, Mohammad Amin Alipour

Abstract: Background and Context. The increasing integration of large language models (LLMs) in computing education presents an emerging challenge in understanding how students use LLMs and craft prompts to solve computational tasks. Prior research has used both qualitative and quantitative methods to analyze prompting behavior, but these approaches lack scalability or fail to effectively capture the semant… ▽ More Background and Context. The increasing integration of large language models (LLMs) in computing education presents an emerging challenge in understanding how students use LLMs and craft prompts to solve computational tasks. Prior research has used both qualitative and quantitative methods to analyze prompting behavior, but these approaches lack scalability or fail to effectively capture the semantic evolution of prompts. Objective. In this paper, we investigate whether students prompts can be systematically analyzed using propositional logic constraints. We examine whether this approach can identify patterns in prompt evolution, detect struggling students, and provide insights into effective and ineffective strategies. Method. We introduce Prompt2Constraints, a novel method that translates students prompts into logical constraints. The constraints are able to represent the intent of the prompts in succinct and quantifiable ways. We used this approach to analyze a dataset of 1,872 prompts from 203 students solving introductory programming tasks. Findings. We find that while successful and unsuccessful attempts tend to use a similar number of constraints overall, when students fail, they often modify their prompts more significantly, shifting problem-solving strategies midway. We also identify points where specific interventions could be most helpful to students for refining their prompts. Implications. This work offers a new and scalable way to detect students who struggle in solving natural language programming tasks. This work could be extended to investigate more complex tasks and integrated into programming tools to provide real-time support. △ Less

Submitted 25 April, 2025; originally announced April 2025.

arXiv:2408.12416 [pdf, other]

Unlearning Trojans in Large Language Models: A Comparison Between Natural Language and Source Code

Authors: Mahdi Kazemi, Aftab Hussain, Md Rafiqul Islam Rabin, Mohammad Amin Alipour, Sen Lin

Abstract: This work investigates the application of Machine Unlearning (MU) for mitigating the impact of trojans embedded in conventional large language models of natural language (Text-LLMs) and large language models of code (Code-LLMs) We propose a novel unlearning approach, LYA, that leverages both gradient ascent and elastic weight consolidation, a Fisher Information Matrix (FIM) based regularization te… ▽ More This work investigates the application of Machine Unlearning (MU) for mitigating the impact of trojans embedded in conventional large language models of natural language (Text-LLMs) and large language models of code (Code-LLMs) We propose a novel unlearning approach, LYA, that leverages both gradient ascent and elastic weight consolidation, a Fisher Information Matrix (FIM) based regularization technique, to unlearn trojans from poisoned models. We compare the effectiveness of LYA against conventional techniques like fine-tuning, retraining, and vanilla gradient ascent. The subject models we investigate are BERT and CodeBERT, for sentiment analysis and code defect detection tasks, respectively. Our findings demonstrate that the combination of gradient ascent and FIM-based regularization, as done in LYA, outperforms existing methods in removing the trojan's influence from the poisoned model, while preserving its original functionality. To the best of our knowledge, this is the first work that compares and contrasts MU of trojans in LLMs, in the NL and Coding domain. △ Less

Submitted 22 August, 2024; originally announced August 2024.

arXiv:2407.05202 [pdf, other]

Harnessing the Power of LLMs: Automating Unit Test Generation for High-Performance Computing

Authors: Rabimba Karanjai, Aftab Hussain, Md Rafiqul Islam Rabin, Lei Xu, Weidong Shi, Mohammad Amin Alipour

Abstract: Unit testing is crucial in software engineering for ensuring quality. However, it's not widely used in parallel and high-performance computing software, particularly scientific applications, due to their smaller, diverse user base and complex logic. These factors make unit testing challenging and expensive, as it requires specialized knowledge and existing automated tools are often ineffective.… ▽ More Unit testing is crucial in software engineering for ensuring quality. However, it's not widely used in parallel and high-performance computing software, particularly scientific applications, due to their smaller, diverse user base and complex logic. These factors make unit testing challenging and expensive, as it requires specialized knowledge and existing automated tools are often ineffective. To address this, we propose an automated method for generating unit tests for such software, considering their unique features like complex logic and parallel processing. Recently, large language models (LLMs) have shown promise in coding and testing. We explored the capabilities of Davinci (text-davinci-002) and ChatGPT (gpt-3.5-turbo) in creating unit tests for C++ parallel programs. Our results show that LLMs can generate mostly correct and comprehensive unit tests, although they have some limitations, such as repetitive assertions and blank test cases. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2407.00305 [pdf, other]

Student-AI Interaction: A Case Study of CS1 students

Authors: Matin Amoozadeh, Daye Nam, Daniel Prol, Ali Alfageeh, James Prather, Michael Hilton, Sruti Srinivasa Ragavan, Mohammad Amin Alipour

Abstract: The new capabilities of generative artificial intelligence tools Generative AI, such as ChatGPT, allow users to interact with the system in intuitive ways, such as simple conversations, and receive (mostly) good-quality answers. These systems can support students' learning objectives by providing accessible explanations and examples even with vague queries. At the same time, they can encourage und… ▽ More The new capabilities of generative artificial intelligence tools Generative AI, such as ChatGPT, allow users to interact with the system in intuitive ways, such as simple conversations, and receive (mostly) good-quality answers. These systems can support students' learning objectives by providing accessible explanations and examples even with vague queries. At the same time, they can encourage undesired help-seeking behaviors by providing solutions to the students' homework. Therefore, it is important to better understand how students approach such tools and the potential issues such approaches might present for the learners. In this paper, we present a case study for understanding student-AI collaboration to solve programming tasks in the CS1 introductory programming course. To this end, we recruited a gender-balanced majority non-white set of 15 CS1 students at a large public university in the US. We observed them solving programming tasks. We used a mixed-method approach to study their interactions as they tackled Python programming tasks, focusing on when and why they used ChatGPT for problem-solving. We analyze and classify the questions submitted by the 15 participants to ChatGPT. Additionally, we analyzed user interaction patterns, their reactions to ChatGPT's responses, and the potential impacts of Generative AI on their perception of self-efficacy. Our results suggest that in about a third of the cases, the student attempted to complete the task by submitting the full description of the tasks to ChatGPT without making any effort on their own. We also observed that few students verified their solutions. We discuss the results and their potential implications. △ Less

Submitted 10 October, 2024; v1 submitted 29 June, 2024; originally announced July 2024.

Comments: Koli Calling 2024

arXiv:2406.04452 [pdf, ps, other]

Revisiting Human Information Foraging: Adaptations for LLM-based Chatbots

Authors: Sruti Srinivasa Ragavan, Mohammad Amin Alipour

Abstract: Information Foraging Theory's (IFT) framing of human information seeking choices as decision-theoretic cost-value judgments has successfully explained how people seek information among linked patches of information (e.g., linked webpages). However, the theory has to be adopted and validated in non-patchy LLM-based chatbot environments, before its postulates can be reliably applied to the design of… ▽ More Information Foraging Theory's (IFT) framing of human information seeking choices as decision-theoretic cost-value judgments has successfully explained how people seek information among linked patches of information (e.g., linked webpages). However, the theory has to be adopted and validated in non-patchy LLM-based chatbot environments, before its postulates can be reliably applied to the design of such chat-based information seeking environments. This paper is a thought experiment that applies the IFT cost-value proposition to LLM-based chatbots and presents a set of preliminary hypotheses to guide future theory-building efforts for how people seek information in such environments. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2405.11466 [pdf, other]

doi 10.1145/3664646.3664764

Measuring Impacts of Poisoning on Model Parameters and Embeddings for Large Language Models of Code

Authors: Aftab Hussain, Md Rafiqul Islam Rabin, Mohammad Amin Alipour

Abstract: Large language models (LLMs) have revolutionized software development practices, yet concerns about their safety have arisen, particularly regarding hidden backdoors, aka trojans. Backdoor attacks involve the insertion of triggers into training data, allowing attackers to manipulate the behavior of the model maliciously. In this paper, we focus on analyzing the model parameters to detect potential… ▽ More Large language models (LLMs) have revolutionized software development practices, yet concerns about their safety have arisen, particularly regarding hidden backdoors, aka trojans. Backdoor attacks involve the insertion of triggers into training data, allowing attackers to manipulate the behavior of the model maliciously. In this paper, we focus on analyzing the model parameters to detect potential backdoor signals in code models. Specifically, we examine attention weights and biases, and context embeddings of the clean and poisoned CodeBERT and CodeT5 models. Our results suggest noticeable patterns in context embeddings of poisoned samples for both the poisoned models; however, attention weights and biases do not show any significant differences. This work contributes to ongoing efforts in white-box detection of backdoor signals in LLMs of code through the analysis of parameters and embeddings. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: This work has been accepted at the 1st ACM International Conference on AI-powered Software (AIware), co-located with the ACM International Conference on the Foundations of Software Engineering (FSE) 2024, Porto de Galinhas, Brazil. arXiv admin note: substantial text overlap with arXiv:2402.12936

arXiv:2405.02828 [pdf, other]

Trojans in Large Language Models of Code: A Critical Review through a Trigger-Based Taxonomy

Authors: Aftab Hussain, Md Rafiqul Islam Rabin, Toufique Ahmed, Bowen Xu, Premkumar Devanbu, Mohammad Amin Alipour

Abstract: Large language models (LLMs) have provided a lot of exciting new capabilities in software development. However, the opaque nature of these models makes them difficult to reason about and inspect. Their opacity gives rise to potential security risks, as adversaries can train and deploy compromised models to disrupt the software development process in the victims' organization. This work presents… ▽ More Large language models (LLMs) have provided a lot of exciting new capabilities in software development. However, the opaque nature of these models makes them difficult to reason about and inspect. Their opacity gives rise to potential security risks, as adversaries can train and deploy compromised models to disrupt the software development process in the victims' organization. This work presents an overview of the current state-of-the-art trojan attacks on large language models of code, with a focus on triggers -- the main design point of trojans -- with the aid of a novel unifying trigger taxonomy framework. We also aim to provide a uniform definition of the fundamental concepts in the area of trojans in Code LLMs. Finally, we draw implications of findings on how code models learn on trigger design. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2305.03803

arXiv:2403.10968 [pdf]

Enhancing IoT Security Against DDoS Attacks through Federated Learning

Authors: Ghazaleh Shirvani, Saeid Ghasemshirazi, Mohammad Ali Alipour

Abstract: The rapid proliferation of the Internet of Things (IoT) has ushered in transformative connectivity between physical devices and the digital realm. Nonetheless, the escalating threat of Distributed Denial of Service (DDoS) attacks jeopardizes the integrity and reliability of IoT networks. Conventional DDoS mitigation approaches are ill-equipped to handle the intricacies of IoT ecosystems, potential… ▽ More The rapid proliferation of the Internet of Things (IoT) has ushered in transformative connectivity between physical devices and the digital realm. Nonetheless, the escalating threat of Distributed Denial of Service (DDoS) attacks jeopardizes the integrity and reliability of IoT networks. Conventional DDoS mitigation approaches are ill-equipped to handle the intricacies of IoT ecosystems, potentially compromising data privacy. This paper introduces an innovative strategy to bolster the security of IoT networks against DDoS attacks by harnessing the power of Federated Learning that allows multiple IoT devices or edge nodes to collaboratively build a global model while preserving data privacy and minimizing communication overhead. The research aims to investigate Federated Learning's effectiveness in detecting and mitigating DDoS attacks in IoT. Our proposed framework leverages IoT devices' collective intelligence for real-time attack detection without compromising sensitive data. This study proposes innovative deep autoencoder approaches for data dimensionality reduction, retraining, and partial selection to enhance the performance and stability of the proposed model. Additionally, two renowned aggregation algorithms, FedAvg and FedAvgM, are employed in this research. Various metrics, including true positive rate, false positive rate, and F1-score, are employed to evaluate the model. The dataset utilized in this research, N-BaIoT, exhibits non-IID data distribution, where data categories are distributed quite differently. The negative impact of these distribution disparities is managed by employing retraining and partial selection techniques, enhancing the final model's stability. Furthermore, evaluation results demonstrate that the FedAvgM aggregation algorithm outperforms FedAvg, indicating that in non-IID datasets, FedAvgM provides better stability and performance. △ Less

Submitted 16 March, 2024; originally announced March 2024.

arXiv:2402.16896 [pdf, other]

On Trojan Signatures in Large Language Models of Code

Authors: Aftab Hussain, Md Rafiqul Islam Rabin, Mohammad Amin Alipour

Abstract: Trojan signatures, as described by Fields et al. (2021), are noticeable differences in the distribution of the trojaned class parameters (weights) and the non-trojaned class parameters of the trojaned model, that can be used to detect the trojaned model. Fields et al. (2021) found trojan signatures in computer vision classification tasks with image models, such as, Resnet, WideResnet, Densenet, an… ▽ More Trojan signatures, as described by Fields et al. (2021), are noticeable differences in the distribution of the trojaned class parameters (weights) and the non-trojaned class parameters of the trojaned model, that can be used to detect the trojaned model. Fields et al. (2021) found trojan signatures in computer vision classification tasks with image models, such as, Resnet, WideResnet, Densenet, and VGG. In this paper, we investigate such signatures in the classifier layer parameters of large language models of source code. Our results suggest that trojan signatures could not generalize to LLMs of code. We found that trojaned code models are stubborn, even when the models were poisoned under more explicit settings (finetuned with pre-trained weights frozen). We analyzed nine trojaned models for two binary classification tasks: clone and defect detection. To the best of our knowledge, this is the first work to examine weight-based trojan signature revelation techniques for large-language models of code and furthermore to demonstrate that detecting trojans only from the weights in such models is a hard problem. △ Less

Submitted 7 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

Comments: This work has been accepted at the International Conference on Learning Representations 2024 Workshop on Secure and Trustworthy Large Language Models, SeT LLM @ ICLR 2024 (Vienna, Austria)

arXiv:2402.12936 [pdf, other]

Measuring Impacts of Poisoning on Model Parameters and Neuron Activations: A Case Study of Poisoning CodeBERT

Authors: Aftab Hussain, Md Rafiqul Islam Rabin, Navid Ayoobi, Mohammad Amin Alipour

Abstract: Large language models (LLMs) have revolutionized software development practices, yet concerns about their safety have arisen, particularly regarding hidden backdoors, aka trojans. Backdoor attacks involve the insertion of triggers into training data, allowing attackers to manipulate the behavior of the model maliciously. In this paper, we focus on analyzing the model parameters to detect potential… ▽ More Large language models (LLMs) have revolutionized software development practices, yet concerns about their safety have arisen, particularly regarding hidden backdoors, aka trojans. Backdoor attacks involve the insertion of triggers into training data, allowing attackers to manipulate the behavior of the model maliciously. In this paper, we focus on analyzing the model parameters to detect potential backdoor signals in code models. Specifically, we examine attention weights and biases, activation values, and context embeddings of the clean and poisoned CodeBERT models. Our results suggest noticeable patterns in activation values and context embeddings of poisoned samples for the poisoned CodeBERT model; however, attention weights and biases do not show any significant differences. This work contributes to ongoing efforts in white-box detection of backdoor signals in LLMs of code through the analysis of parameters and activations. △ Less

Submitted 5 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

arXiv:2312.04004 [pdf, other]

Occlusion-based Detection of Trojan-triggering Inputs in Large Language Models of Code

Authors: Aftab Hussain, Md Rafiqul Islam Rabin, Toufique Ahmed, Mohammad Amin Alipour, Bowen Xu

Abstract: Large language models (LLMs) are becoming an integrated part of software development. These models are trained on large datasets for code, where it is hard to verify each data point. Therefore, a potential attack surface can be to inject poisonous data into the training data to make models vulnerable, aka trojaned. It can pose a significant threat by hiding manipulative behaviors inside models, le… ▽ More Large language models (LLMs) are becoming an integrated part of software development. These models are trained on large datasets for code, where it is hard to verify each data point. Therefore, a potential attack surface can be to inject poisonous data into the training data to make models vulnerable, aka trojaned. It can pose a significant threat by hiding manipulative behaviors inside models, leading to compromising the integrity of the models in downstream tasks. In this paper, we propose an occlusion-based human-in-the-loop technique, OSeql, to distinguish trojan-triggering inputs of code. The technique is based on the observation that trojaned neural models of code rely heavily on the triggering part of input; hence, its removal would change the confidence of the models in their prediction substantially. Our results suggest that OSeql can detect the triggering inputs with almost 100% recall. We discuss the problem of false positives and how to address them. These results provide a baseline for future studies in this field. △ Less

Submitted 10 December, 2023; v1 submitted 6 December, 2023; originally announced December 2023.

arXiv:2311.14850 [pdf, other]

TrojanedCM: A Repository of Trojaned Large Language Models of Code

Authors: Aftab Hussain, Md Rafiqul Islam Rabin, Mohammad Amin Alipour

Abstract: With the rapid growth of research in trojaning deep neural models of source code, we observe that there is a need of developing a benchmark trojaned models for testing various trojan detection and unlearning techniques. In this work, we aim to provide the scientific community with diverse trojaned code models, that cover a variety of state-of-the-art architectures, on which they can examine such t… ▽ More With the rapid growth of research in trojaning deep neural models of source code, we observe that there is a need of developing a benchmark trojaned models for testing various trojan detection and unlearning techniques. In this work, we aim to provide the scientific community with diverse trojaned code models, that cover a variety of state-of-the-art architectures, on which they can examine such techniques. We thus present TrojanedCM, a publicly available repository of clean and poisoned models of source code. We provide poisoned models for two code classification tasks (defect detection and clone detection) and a code generation task (text-to-code generation). We finetuned popular pretrained code models such as CodeBERT, PLBART, CodeT5, CodeT5+, on poisoned datasets that we generated from benchmark datasets (Devign, BigCloneBench, CONCODE) for the above mentioned tasks. The repository also provides full access to the architecture and parameters of the models, allowing practitioners to investigate different white-box analysis techniques. In addition to the poisoned models, we also provide a poisoning framework using which practitioners can deploy various poisoning strategies for the different tasks and models of source code. All the material are accessible via this link: https://github.com/UH-SERG/TrojanedCM. △ Less

Submitted 11 December, 2023; v1 submitted 24 November, 2023; originally announced November 2023.

arXiv:2310.04631 [pdf, other]

Trust in Generative AI among students: An Exploratory Study

Authors: Matin Amoozadeh, David Daniels, Daye Nam, Aayush Kumar, Stella Chen, Michael Hilton, Sruti Srinivasa Ragavan, Mohammad Amin Alipour

Abstract: Generative artificial systems (GenAI) have experienced exponential growth in the past couple of years. These systems offer exciting capabilities, such as generating programs, that students can well utilize for their learning. Among many dimensions that might affect the effective adoption of GenAI, in this paper, we investigate students' \textit{trust}. Trust in GenAI influences the extent to which… ▽ More Generative artificial systems (GenAI) have experienced exponential growth in the past couple of years. These systems offer exciting capabilities, such as generating programs, that students can well utilize for their learning. Among many dimensions that might affect the effective adoption of GenAI, in this paper, we investigate students' \textit{trust}. Trust in GenAI influences the extent to which students adopt GenAI, in turn affecting their learning. In this study, we surveyed 253 students at two large universities to understand how much they trust \genai tools and their feedback on how GenAI impacts their performance in CS courses. Our results show that students have different levels of trust in GenAI. We also observe different levels of confidence and motivation, highlighting the need for further understanding of factors impacting trust. △ Less

Submitted 1 February, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: Accepted at SIGCSE 2024

arXiv:2309.03582 [pdf]

Zero Trust: Applications, Challenges, and Opportunities

Authors: Saeid Ghasemshirazi, Ghazaleh Shirvani, Mohammad Ali Alipour

Abstract: The escalating complexity of cybersecurity threats necessitates innovative approaches to safeguard digital assets and sensitive information. The Zero Trust paradigm offers a transformative solution by challenging conventional security models and emphasizing continuous verification and least privilege access. This survey comprehensively explores the theoretical foundations, practical implementation… ▽ More The escalating complexity of cybersecurity threats necessitates innovative approaches to safeguard digital assets and sensitive information. The Zero Trust paradigm offers a transformative solution by challenging conventional security models and emphasizing continuous verification and least privilege access. This survey comprehensively explores the theoretical foundations, practical implementations, applications, challenges, and future trends of Zero Trust. Through meticulous analysis, we highlight the relevance of Zero Trust in securing cloud environments, facilitating remote work, and protecting the Internet of Things (IoT) ecosystem. While cultural barriers and technical complexities present challenges, their mitigation unlocks Zero Trust's potential. Integrating Zero Trust with emerging technologies like AI and machine learning augments its efficacy, promising a dynamic and responsive security landscape. Embracing Zero Trust empowers organizations to navigate the ever-evolving cybersecurity realm with resilience and adaptability, redefining trust in the digital age. △ Less

Submitted 7 September, 2023; originally announced September 2023.

arXiv:2305.03803 [pdf, other]

A Survey of Trojans in Neural Models of Source Code: Taxonomy and Techniques

Authors: Aftab Hussain, Md Rafiqul Islam Rabin, Toufique Ahmed, Navid Ayoobi, Bowen Xu, Prem Devanbu, Mohammad Amin Alipour

Abstract: In this work, we study literature in Explainable AI and Safe AI to understand poisoning of neural models of code. In order to do so, we first establish a novel taxonomy for Trojan AI for code, and present a new aspect-based classification of triggers in neural models of code. Next, we highlight recent works that help us deepen our conception of how these models understand software code. Then we pi… ▽ More In this work, we study literature in Explainable AI and Safe AI to understand poisoning of neural models of code. In order to do so, we first establish a novel taxonomy for Trojan AI for code, and present a new aspect-based classification of triggers in neural models of code. Next, we highlight recent works that help us deepen our conception of how these models understand software code. Then we pick some of the recent, state-of-art poisoning strategies that can be used to manipulate such models. The insights we draw can potentially help to foster future research in the area of Trojan AI for code. △ Less

Submitted 18 April, 2024; v1 submitted 5 May, 2023; originally announced May 2023.

arXiv:2303.04942 [pdf, other]

A Study of Variable-Role-based Feature Enrichment in Neural Models of Code

Authors: Aftab Hussain, Md Rafiqul Islam Rabin, Bowen Xu, David Lo, Mohammad Amin Alipour

Abstract: Although deep neural models substantially reduce the overhead of feature engineering, the features readily available in the inputs might significantly impact training cost and the performance of the models. In this paper, we explore the impact of an unsuperivsed feature enrichment approach based on variable roles on the performance of neural models of code. The notion of variable roles (as introdu… ▽ More Although deep neural models substantially reduce the overhead of feature engineering, the features readily available in the inputs might significantly impact training cost and the performance of the models. In this paper, we explore the impact of an unsuperivsed feature enrichment approach based on variable roles on the performance of neural models of code. The notion of variable roles (as introduced in the works of Sajaniemi et al. [Refs. 1,2]) has been found to help students' abilities in programming. In this paper, we investigate if this notion would improve the performance of neural models of code. To the best of our knowledge, this is the first work to investigate how Sajaniemi et al.'s concept of variable roles can affect neural models of code. In particular, we enrich a source code dataset by adding the role of individual variables in the dataset programs, and thereby conduct a study on the impact of variable role enrichment in training the Code2Seq model. In addition, we shed light on some challenges and opportunities in feature enrichment for neural code intelligence models. △ Less

Submitted 12 March, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

Comments: Accepted in the 1st International Workshop on Interpretability and Robustness in Neural Software Engineering (InteNSE'23), Co-located with ICSE

arXiv:2303.01739 [pdf, other]

doi 10.1109/InteNSE59150.2023.00005

Study of Distractors in Neural Models of Code

Authors: Md Rafiqul Islam Rabin, Aftab Hussain, Sahil Suneja, Mohammad Amin Alipour

Abstract: Finding important features that contribute to the prediction of neural models is an active area of research in explainable AI. Neural models are opaque and finding such features sheds light on a better understanding of their predictions. In contrast, in this work, we present an inverse perspective of distractor features: features that cast doubt about the prediction by affecting the model's confid… ▽ More Finding important features that contribute to the prediction of neural models is an active area of research in explainable AI. Neural models are opaque and finding such features sheds light on a better understanding of their predictions. In contrast, in this work, we present an inverse perspective of distractor features: features that cast doubt about the prediction by affecting the model's confidence in its prediction. Understanding distractors provide a complementary view of the features' relevance in the predictions of neural models. In this paper, we apply a reduction-based technique to find distractors and provide our preliminary results of their impacts and types. Our experiments across various tasks, models, and datasets of code reveal that the removal of tokens can have a significant impact on the confidence of models in their predictions and the categories of tokens can also play a vital role in the model's confidence. Our study aims to enhance the transparency of models by emphasizing those tokens that significantly influence the confidence of the models. △ Less

Submitted 3 March, 2023; originally announced March 2023.

Comments: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, Co-located with ICSE (InteNSE'23)

arXiv:2210.01739 [pdf]

Enabling a Zero Trust Architecture in a 5G-enabled Smart Grid

Authors: Mohammad Ali Alipour, Saeid Ghasemshirazi, Ghazaleh Shirvani

Abstract: One of the most promising applications of the IoT is the Smart Grid (SG). Integrating SG's data communications network into the power grid allows gathering and analyzing information from power lines, distribution power stations, and end users. A smart grid (SG) requires a prompt and dependable connection to provide real-time monitoring through the IoT. Hence 5G could be considered a catalyst for u… ▽ More One of the most promising applications of the IoT is the Smart Grid (SG). Integrating SG's data communications network into the power grid allows gathering and analyzing information from power lines, distribution power stations, and end users. A smart grid (SG) requires a prompt and dependable connection to provide real-time monitoring through the IoT. Hence 5G could be considered a catalyst for upgrading the existing power grid systems. Nonetheless, the additional attack surface of information infrastructure has been brought about by the widespread adoption of ubiquitous connectivity in 5G, to which the typical information security system in the smart grid cannot respond promptly. Therefore, guaranteeing the Privacy and Security of a network in a threatening, ever-changing environment requires groundbreaking architectures that go well beyond the limitations of traditional, static security measures. With "Continuous Identity Authentication and Dynamic Access Control" as its foundation, this article analyzes the Zero Trust (ZT) architecture specific to the power system of IoT and uses that knowledge to develop a security protection architecture. △ Less

Submitted 21 October, 2022; v1 submitted 4 October, 2022; originally announced October 2022.

arXiv:2205.14374 [pdf, other]

doi 10.1145/3520312.3534869

Syntax-Guided Program Reduction for Understanding Neural Code Intelligence Models

Authors: Md Rafiqul Islam Rabin, Aftab Hussain, Mohammad Amin Alipour

Abstract: Neural code intelligence (CI) models are opaque black-boxes and offer little insight on the features they use in making predictions. This opacity may lead to distrust in their prediction and hamper their wider adoption in safety-critical applications. Recently, input program reduction techniques have been proposed to identify key features in the input programs to improve the transparency of CI mod… ▽ More Neural code intelligence (CI) models are opaque black-boxes and offer little insight on the features they use in making predictions. This opacity may lead to distrust in their prediction and hamper their wider adoption in safety-critical applications. Recently, input program reduction techniques have been proposed to identify key features in the input programs to improve the transparency of CI models. However, this approach is syntax-unaware and does not consider the grammar of the programming language. In this paper, we apply a syntax-guided program reduction technique that considers the grammar of the input programs during reduction. Our experiments on multiple models across different types of input programs show that the syntax-guided program reduction technique is faster and provides smaller sets of key tokens in reduced programs. We also show that the key tokens could be used in generating adversarial examples for up to 65% of the input programs. △ Less

Submitted 14 June, 2022; v1 submitted 28 May, 2022; originally announced May 2022.

Comments: The 6th ACM SIGPLAN International Symposium on Machine Programming (MAPS'22); Related to arXiv:2202.06474

arXiv:2112.13297 [pdf, other]

DIAR: Removing Uninteresting Bytes from Seeds in Software Fuzzing

Authors: Aftab Hussain, Mohammad Amin Alipour

Abstract: Software fuzzing mutates bytes in the test seeds to explore different behaviors of the program under test. Initial seeds can have great impact on the performance of a fuzzing campaign. Mutating a lot of uninteresting bytes in a large seed wastes the fuzzing resources. In this paper, we present the preliminary results of our approach that aims to improve the performance of fuzzers through identifyi… ▽ More Software fuzzing mutates bytes in the test seeds to explore different behaviors of the program under test. Initial seeds can have great impact on the performance of a fuzzing campaign. Mutating a lot of uninteresting bytes in a large seed wastes the fuzzing resources. In this paper, we present the preliminary results of our approach that aims to improve the performance of fuzzers through identifying and removing uninteresting bytes in the seeds. In particular, we present DIAR, a technique that reduces the size of the seeds based on their coverage. Our preliminary results suggest fuzzing campaigns that start with reduced seeds, find new paths faster, and can produce higher coverage overall. △ Less

Submitted 25 December, 2021; originally announced December 2021.

arXiv:2112.13207 [pdf, other]

FMViz: Visualizing Tests Generated by AFL at the Byte-level

Authors: Aftab Hussain, Mohammad Amin Alipour

Abstract: Software fuzzing is a strong testing technique that has become the de facto approach for automated software testing and software vulnerability detection in the industry. The random nature of fuzzing makes monitoring and understanding the behavior of fuzzers difficult. In this paper, we report the development of Fuzzer Mutation Visualizer (FMViz), a tool that focuses on visualizing byte-level mutat… ▽ More Software fuzzing is a strong testing technique that has become the de facto approach for automated software testing and software vulnerability detection in the industry. The random nature of fuzzing makes monitoring and understanding the behavior of fuzzers difficult. In this paper, we report the development of Fuzzer Mutation Visualizer (FMViz), a tool that focuses on visualizing byte-level mutations in fuzzers. In particular, FMViz extends American Fuzzy Lop (AFL) to visualize the generated test inputs and highlight changes between consecutively generated seeds as a fuzzing campaign progresses. The overarching goal of our tool is to help developers and students comprehend the inner-workings of the AFL fuzzer better. In this paper, we present the architecture of FMViz, discuss a sample case study of it, and outline the future work. FMViz is open-source and publicly available at https://github.com/AftabHussain/afl-test-viz. △ Less

Submitted 25 December, 2021; originally announced December 2021.

arXiv:2111.01097 [pdf, ps, other]

doi 10.1109/ICMLA55696.2022.00140

Code2Snapshot: Using Code Snapshots for Learning Representations of Source Code

Authors: Md Rafiqul Islam Rabin, Mohammad Amin Alipour

Abstract: There are several approaches for encoding source code in the input vectors of neural models. These approaches attempt to include various syntactic and semantic features of input programs in their encoding. In this paper, we investigate Code2Snapshot, a novel representation of the source code that is based on the snapshots of input programs. We evaluate several variations of this representation and… ▽ More There are several approaches for encoding source code in the input vectors of neural models. These approaches attempt to include various syntactic and semantic features of input programs in their encoding. In this paper, we investigate Code2Snapshot, a novel representation of the source code that is based on the snapshots of input programs. We evaluate several variations of this representation and compare its performance with state-of-the-art representations that utilize the rich syntactic and semantic features of input programs. Our preliminary study on the utility of Code2Snapshot in the code summarization and code classification tasks suggests that simple snapshots of input programs have comparable performance to state-of-the-art representations. Interestingly, obscuring input programs have insignificant impacts on the Code2Snapshot performance, suggesting that, for some tasks, neural models may provide high performance by relying merely on the structure of input programs. △ Less

Submitted 1 February, 2023; v1 submitted 1 November, 2021; originally announced November 2021.

Comments: The 21st IEEE International Conference on Machine Learning and Applications (ICMLA'22)

arXiv:2106.08704 [pdf, other]

doi 10.1016/j.infsof.2022.107066

Memorization and Generalization in Neural Code Intelligence Models

Authors: Md Rafiqul Islam Rabin, Aftab Hussain, Mohammad Amin Alipour, Vincent J. Hellendoorn

Abstract: Deep Neural Networks (DNNs) are increasingly being used in software engineering and code intelligence tasks. These are powerful tools that are capable of learning highly generalizable patterns from large datasets through millions of parameters. At the same time, their large capacity can render them prone to memorizing data points. Recent work suggests that the memorization risk manifests especiall… ▽ More Deep Neural Networks (DNNs) are increasingly being used in software engineering and code intelligence tasks. These are powerful tools that are capable of learning highly generalizable patterns from large datasets through millions of parameters. At the same time, their large capacity can render them prone to memorizing data points. Recent work suggests that the memorization risk manifests especially strongly when the training dataset is noisy, involving many ambiguous or questionable samples, and memorization is the only recourse. The goal of this paper is to evaluate and compare the extent of memorization and generalization in neural code intelligence models. It aims to provide insights on how memorization may impact the learning behavior of neural models in code intelligence systems. To observe the extent of memorization in models, we add random noise to the original training dataset and use various metrics to quantify the impact of noise on various aspects of training and testing. We evaluate several state-of-the-art neural code intelligence models and benchmarks based on Java, Python, and Ruby codebases. Our results highlight important risks: millions of trainable parameters allow the neural networks to memorize anything, including noisy data, and provide a false sense of generalization. We observed all models manifest some forms of memorization. This can be potentially troublesome in most code intelligence tasks where they rely on rather noise-prone and repetitive data sources, such as code from GitHub. To the best of our knowledge, we provide the first study to quantify memorization effects in the domain of software engineering and code intelligence systems. This work raises awareness and provides new insights into important issues of training neural models in code intelligence systems that are usually overlooked by software engineering researchers. △ Less

Submitted 12 September, 2022; v1 submitted 16 June, 2021; originally announced June 2021.

Comments: Information and Software Technology, IST Journal 2022, Elsevier

arXiv:2106.03353 [pdf, other]

doi 10.1145/3468264.3468539

Understanding Neural Code Intelligence Through Program Simplification

Authors: Md Rafiqul Islam Rabin, Vincent J. Hellendoorn, Mohammad Amin Alipour

Abstract: A wide range of code intelligence (CI) tools, powered by deep neural networks, have been developed recently to improve programming productivity and perform program analysis. To reliably use such tools, developers often need to reason about the behavior of the underlying models and the factors that affect them. This is especially challenging for tools backed by deep neural networks. Various methods… ▽ More A wide range of code intelligence (CI) tools, powered by deep neural networks, have been developed recently to improve programming productivity and perform program analysis. To reliably use such tools, developers often need to reason about the behavior of the underlying models and the factors that affect them. This is especially challenging for tools backed by deep neural networks. Various methods have tried to reduce this opacity in the vein of "transparent/interpretable-AI". However, these approaches are often specific to a particular set of network architectures, even requiring access to the network's parameters. This makes them difficult to use for the average programmer, which hinders the reliable adoption of neural CI systems. In this paper, we propose a simple, model-agnostic approach to identify critical input features for models in CI systems, by drawing on software debugging research, specifically delta debugging. Our approach, SIVAND, uses simplification techniques that reduce the size of input programs of a CI model while preserving the predictions of the model. We show that this approach yields remarkably small outputs and is broadly applicable across many model architectures and problem domains. We find that the models in our experiments often rely heavily on just a few syntactic features in input programs. We believe that SIVAND's extracted features may help understand neural CI systems' predictions and learned behavior. △ Less

Submitted 9 September, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

Comments: The 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE'21)

arXiv:2012.10662 [pdf, other]

doi 10.1145/3412841.3442047

Configuring Test Generators using Bug Reports: A Case Study of GCC Compiler and Csmith

Authors: Md Rafiqul Islam Rabin, Mohammad Amin Alipour

Abstract: The correctness of compilers is instrumental in the safety and reliability of other software systems, as bugs in compilers can produce executables that do not reflect the intent of programmers. Such errors are difficult to identify and debug. Random test program generators are commonly used in testing compilers, and they have been effective in uncovering bugs. However, the problem of guiding these… ▽ More The correctness of compilers is instrumental in the safety and reliability of other software systems, as bugs in compilers can produce executables that do not reflect the intent of programmers. Such errors are difficult to identify and debug. Random test program generators are commonly used in testing compilers, and they have been effective in uncovering bugs. However, the problem of guiding these test generators to produce test programs that are more likely to find bugs remains challenging. In this paper, we use the code snippets in the bug reports to guide the test generation. The main idea of this work is to extract insights from the bug reports about the language features that are more prone to inadequate implementation and using the insights to guide the test generators. We use the GCC C compiler to evaluate the effectiveness of this approach. In particular, we first cluster the test programs in the GCC bugs reports based on their features. We then use the centroids of the clusters to compute configurations for Csmith, a popular test generator for C compilers. We evaluated this approach on eight versions of GCC and found that our approach provides higher coverage and triggers more miscompilation failures than the state-of-the-art test generation techniques for GCC. △ Less

Submitted 18 March, 2021; v1 submitted 19 December, 2020; originally announced December 2020.

Comments: The 36th ACM/SIGAPP Symposium on Applied Computing, Software Verification and Testing Track (SAC-SVT'21)

arXiv:2008.13064 [pdf, other]

doi 10.1145/3416506.3423580

Towards Demystifying Dimensions of Source Code Embeddings

Authors: Md Rafiqul Islam Rabin, Arjun Mukherjee, Omprakash Gnawali, Mohammad Amin Alipour

Abstract: Source code representations are key in applying machine learning techniques for processing and analyzing programs. A popular approach in representing source code is neural source code embeddings that represents programs with high-dimensional vectors computed by training deep neural networks on a large volume of programs. Although successful, there is little known about the contents of these vector… ▽ More Source code representations are key in applying machine learning techniques for processing and analyzing programs. A popular approach in representing source code is neural source code embeddings that represents programs with high-dimensional vectors computed by training deep neural networks on a large volume of programs. Although successful, there is little known about the contents of these vectors and their characteristics. In this paper, we present our preliminary results towards better understanding the contents of code2vec neural source code embeddings. In particular, in a small case study, we use the code2vec embeddings to create binary SVM classifiers and compare their performance with the handcrafted features. Our results suggest that the handcrafted features can perform very close to the highly-dimensional code2vec embeddings, and the information gains are more evenly distributed in the code2vec embeddings compared to the handcrafted features. We also find that the code2vec embeddings are more resilient to the removal of dimensions with low information gains than the handcrafted features. We hope our results serve a stepping stone toward principled analysis and evaluation of these code representations. △ Less

Submitted 28 September, 2020; v1 submitted 29 August, 2020; originally announced August 2020.

Comments: 1st ACM SIGSOFT International Workshop on Representation Learning for Software Engineering and Program Languages, Co-located with ESEC/FSE (RL+SE&PL'20)

arXiv:2008.01566 [pdf, other]

doi 10.1016/j.infsof.2021.106552

On the Generalizability of Neural Program Models with respect to Semantic-Preserving Program Transformations

Authors: Md Rafiqul Islam Rabin, Nghi D. Q. Bui, Ke Wang, Yijun Yu, Lingxiao Jiang, Mohammad Amin Alipour

Abstract: With the prevalence of publicly available source code repositories to train deep neural network models, neural program models can do well in source code analysis tasks such as predicting method names in given programs that cannot be easily done by traditional program analysis techniques. Although such neural program models have been tested on various existing datasets, the extent to which they gen… ▽ More With the prevalence of publicly available source code repositories to train deep neural network models, neural program models can do well in source code analysis tasks such as predicting method names in given programs that cannot be easily done by traditional program analysis techniques. Although such neural program models have been tested on various existing datasets, the extent to which they generalize to unforeseen source code is largely unknown. Since it is very challenging to test neural program models on all unforeseen programs, in this paper, we propose to evaluate the generalizability of neural program models with respect to semantic-preserving transformations: a generalizable neural program model should perform equally well on programs that are of the same semantics but of different lexical appearances and syntactical structures. We compare the results of various neural program models for the method name prediction task on programs before and after automated semantic-preserving transformations. We use three Java datasets of different sizes and three state-of-the-art neural network models for code, namely code2vec, code2seq, and GGNN, to build nine such neural program models for evaluation. Our results show that even with small semantically preserving changes to the programs, these neural program models often fail to generalize their performance. Our results also suggest that neural program models based on data and control dependencies in programs generalize better than neural program models based only on abstract syntax trees. On the positive side, we observe that as the size of the training dataset grows and diversifies the generalizability of correct predictions produced by the neural program models can be improved too. Our results on the generalizability of neural program models provide insights to measure their limitations and provide a stepping stone for their improvement. △ Less

Submitted 18 March, 2021; v1 submitted 31 July, 2020; originally announced August 2020.

Comments: Information and Software Technology, IST Journal 2021, Elsevier. Related to arXiv:2004.07313

arXiv:2004.07313 [pdf, other]

Evaluation of Generalizability of Neural Program Analyzers under Semantic-Preserving Transformations

Authors: Md Rafiqul Islam Rabin, Mohammad Amin Alipour

Abstract: The abundance of publicly available source code repositories, in conjunction with the advances in neural networks, has enabled data-driven approaches to program analysis. These approaches, called neural program analyzers, use neural networks to extract patterns in the programs for tasks ranging from development productivity to program reasoning. Despite the growing popularity of neural program ana… ▽ More The abundance of publicly available source code repositories, in conjunction with the advances in neural networks, has enabled data-driven approaches to program analysis. These approaches, called neural program analyzers, use neural networks to extract patterns in the programs for tasks ranging from development productivity to program reasoning. Despite the growing popularity of neural program analyzers, the extent to which their results are generalizable is unknown. In this paper, we perform a large-scale evaluation of the generalizability of two popular neural program analyzers using seven semantically-equivalent transformations of programs. Our results caution that in many cases the neural program analyzers fail to generalize well, sometimes to programs with negligible textual differences. The results provide the initial stepping stones for quantifying robustness in neural program analyzers. △ Less

Submitted 18 March, 2021; v1 submitted 15 April, 2020; originally announced April 2020.

Comments: Related to arXiv:2008.01566

arXiv:1908.10711 [pdf, other]

Testing Neural Program Analyzers

Authors: Md Rafiqul Islam Rabin, Ke Wang, Mohammad Amin Alipour

Abstract: Deep neural networks have been increasingly used in software engineering and program analysis tasks. They usually take a program and make some predictions about it, e.g., bug prediction. We call these models neural program analyzers. The reliability of neural programs can impact the reliability of the encompassing analyses. In this paper, we describe our ongoing efforts to develop effective techni… ▽ More Deep neural networks have been increasingly used in software engineering and program analysis tasks. They usually take a program and make some predictions about it, e.g., bug prediction. We call these models neural program analyzers. The reliability of neural programs can impact the reliability of the encompassing analyses. In this paper, we describe our ongoing efforts to develop effective techniques for testing neural programs. We discuss the challenges involved in developing such tools and our future plans. In our preliminary experiment on a neural model recently proposed in the literature, we found that the model is very brittle, and simple perturbations in the input can cause the model to make mistakes in its prediction. △ Less

Submitted 25 September, 2019; v1 submitted 25 August, 2019; originally announced August 2019.

Comments: ASE 2019 Late Breaking Results

arXiv:1908.10481 [pdf, other]

K-CONFIG: Using Failing Test Cases to Generate Test Cases in GCC Compilers

Authors: Md Rafiqul Islam Rabin, Mohammad Amin Alipour

Abstract: The correctness of compilers is instrumental in the safety and reliability of other software systems, as bugs in compilers can produce programs that do not reflect the intents of programmers. Compilers are complex software systems due to the complexity of optimization. GCC is an optimizing C compiler that has been used in building operating systems and many other system software. In this paper, we… ▽ More The correctness of compilers is instrumental in the safety and reliability of other software systems, as bugs in compilers can produce programs that do not reflect the intents of programmers. Compilers are complex software systems due to the complexity of optimization. GCC is an optimizing C compiler that has been used in building operating systems and many other system software. In this paper, we describe K-CONFIG, an approach that uses the bugs reported in the GCC repository to generate new test inputs. Our main insight is that the features appearing in the bug reports are likely to reappear in the future bugs, as the bugfixes can be incomplete or those features may be inherently challenging to implement hence more prone to errors. Our approach first clusters the failing test input extracted from the bug reports into clusters of similar test inputs. It then uses these clusters to create configurations for Csmith, the most popular test generator for C compilers. In our experiments on two versions of GCC, our approach could trigger up to 36 miscompilation failures, and 179 crashes, while Csmith with the default configuration did not trigger any failures. This work signifies the benefits of analyzing and using the reported bugs in the generation of new test inputs. △ Less

Submitted 27 August, 2019; originally announced August 2019.

Comments: ASE 2019 Late Breaking Results

arXiv:1902.06384 [pdf, ps, other]

Topics of Concern: Identifying User Issues in Reviews of IoT Apps and Devices

Authors: Andrew Truelove, Farah Naz Chowdhury, Omprakash Gnawali, Mohammad Amin Alipour

Abstract: Internet of Things (IoT) systems are bundles of networked sensors and actuators that are deployed in an environment and act upon the sensory data that they receive. These systems, especially consumer electronics, have two main cooperating components: a device and a mobile app. The unique combination of hardware and software in IoT systems presents challenges that are lesser known to mainstream sof… ▽ More Internet of Things (IoT) systems are bundles of networked sensors and actuators that are deployed in an environment and act upon the sensory data that they receive. These systems, especially consumer electronics, have two main cooperating components: a device and a mobile app. The unique combination of hardware and software in IoT systems presents challenges that are lesser known to mainstream software developers. They might require innovative solutions to support the development and integration of such systems. In this paper, we analyze more than 90,000 reviews of ten IoT devices and their corresponding apps and extract the issues that users encountered while using these systems. Our results indicate that issues with connectivity, timing, and updates are particularly prevalent in the reviews. Our results call for a new software-hardware development framework to assist the development of reliable IoT systems. △ Less

Submitted 29 March, 2019; v1 submitted 17 February, 2019; originally announced February 2019.

Comments: 1st International Workshop on Software Engineering Research & Practices for the Internet of Things (SERP4IoT 2019)

arXiv:1902.06193 [pdf, other]

An Automated Testing Framework for Conversational Agents

Authors: Soodeh Atefi, Mohammad Amin Alipour

Abstract: Conversational agents are systems with a conversational interface that afford interaction in spoken language. These systems are becoming prevalent and are preferred in various contexts and for many users. Despite their increasing success, the automated testing infrastructure to support the effective and efficient development of such systems compared to traditional software systems is still limited… ▽ More Conversational agents are systems with a conversational interface that afford interaction in spoken language. These systems are becoming prevalent and are preferred in various contexts and for many users. Despite their increasing success, the automated testing infrastructure to support the effective and efficient development of such systems compared to traditional software systems is still limited. Automated testing framework for conversational systems can improve the quality of these systems by assisting developers to write, execute, and maintain test cases. In this paper, we introduce our work-in-progress automated testing framework, and its realization in the Python programming language. We discuss some research problems in the development of such an automated testing framework for conversational agents. In particular, we point out the problems of the specification of the expected behavior, known as test oracles, and semantic comparison of utterances. △ Less

Submitted 16 February, 2019; originally announced February 2019.

Comments: work in progress

arXiv:1611.01501 [pdf, other]

Data Poisoning: Lightweight Soft Fault Injection for Python

Authors: Mohammad Amin Alipour, Alex Groce

Abstract: This paper introduces and explores the idea of data poisoning, a light-weight peer-architecture technique to inject faults into Python programs. This method requires very small modification to the original program, which facilitates evaluation of sensitivity of systems that are prototyped or modeled in Python. We propose different fault scenarios that can be injected to programs using data poisoni… ▽ More This paper introduces and explores the idea of data poisoning, a light-weight peer-architecture technique to inject faults into Python programs. This method requires very small modification to the original program, which facilitates evaluation of sensitivity of systems that are prototyped or modeled in Python. We propose different fault scenarios that can be injected to programs using data poisoning. We use Dijkstra's Self Stabilizing Ring Algorithm to illustrate the approach. △ Less

Submitted 4 November, 2016; originally announced November 2016.

Comments: Draft- 6 pages

ACM Class: D.4.5

arXiv:1610.08020 [pdf, ps, other]

Bounded Model Checking and Feature Omission Diversity

Authors: Mohammad Amin Alipour, Alex Groce

Abstract: In this paper we introduce a novel way to speed up the discovery of counterexamples in bounded model checking, based on parallel runs over versions of a system in which features have been randomly disabled. As shown in previous work, adding constraints to a bounded model checking problem can reduce the size of the verification problem and dramatically decrease the time required to find counterexam… ▽ More In this paper we introduce a novel way to speed up the discovery of counterexamples in bounded model checking, based on parallel runs over versions of a system in which features have been randomly disabled. As shown in previous work, adding constraints to a bounded model checking problem can reduce the size of the verification problem and dramatically decrease the time required to find counterexample. Adapting a technique developed in software testing to this problem provides a simple way to produce useful partial verification problems, with a resulting decrease in average time until a counterexample is produced. If no counterexample is found, partial verification results can also be useful in practice. △ Less

Submitted 20 September, 2016; originally announced October 2016.

Comments: 5 pages

ACM Class: D.2.4

arXiv:1609.06382 [pdf, other]

Finding Model-Checkable Needles in Large Source Code Haystacks: Modular Bug-Finding via Static Analysis and Dynamic Invariant Discovery

Authors: Mohammad Amin Alipour, Alex Groce, Chaoqiang Zhang, Anahita Sanadaji, Gokul Caushik

Abstract: In this paper, we present a novel marriage of static and dynamic analysis. Given a large code base with many functions and a mature test suite, we propose using static analysis to find functions 1) with assertions or other evident correctness properties (e.g., array bounds requirements or pointer access) and 2) with simple enough control flow and data use to be amenable to predicate-abstraction ba… ▽ More In this paper, we present a novel marriage of static and dynamic analysis. Given a large code base with many functions and a mature test suite, we propose using static analysis to find functions 1) with assertions or other evident correctness properties (e.g., array bounds requirements or pointer access) and 2) with simple enough control flow and data use to be amenable to predicate-abstraction based or bounded model checking without human intervention. Because most such functions in realistic software systems in fact rely on many input preconditions not specified by the language's type system (or annotated in any way), we propose using dynamically discovered invariants based on a program's test suite to characterize likely preconditions, in order to reduce the problem of false positives. While providing little in the way of verification, this approach may provide an additional quick and highly scalable bug-finding method for programs that are usually considered "too large to model check." We present a simple example showing that the technique can be useful for a more typically "model-checkable" code base, even in the presence of a poorly designed test suite and bad invariants. △ Less

Submitted 20 September, 2016; originally announced September 2016.

Comments: 6 pages

ACM Class: D.2.4; D.2.5

Showing 1–36 of 36 results for author: Alipour, M A