Search | arXiv e-print repository

SkipFuzz: Active Learning-based Input Selection for Fuzzing Deep Learning Libraries

Authors: Hong Jin Kang, Pattarakrit Rattanukul, Stefanus Agus Haryono, Truong Giang Nguyen, Chaiyong Ragkhitwetsagul, Corina Pasareanu, David Lo

Abstract: Many modern software systems are enabled by deep learning libraries such as TensorFlow and PyTorch. As deep learning is now prevalent, the security of deep learning libraries is a key concern. Fuzzing deep learning libraries presents two challenges. Firstly, to reach the functionality of the libraries, fuzzers have to use inputs from the valid input domain of each API function, which may be unknow… ▽ More Many modern software systems are enabled by deep learning libraries such as TensorFlow and PyTorch. As deep learning is now prevalent, the security of deep learning libraries is a key concern. Fuzzing deep learning libraries presents two challenges. Firstly, to reach the functionality of the libraries, fuzzers have to use inputs from the valid input domain of each API function, which may be unknown. Secondly, many inputs are redundant. Randomly sampled invalid inputs are likely not to trigger new behaviors. While existing approaches partially address the first challenge, they overlook the second challenge. We propose SkipFuzz, an approach for fuzzing deep learning libraries. To generate valid inputs, SkipFuzz learns the input constraints of each API function using active learning. By using information gained during fuzzing, SkipFuzz infers a model of the input constraints, and, thus, generate valid inputs. SkipFuzz comprises an active learner which queries a test executor to obtain feedback for inference. After constructing hypotheses, the active learner poses queries and refines the hypotheses using the feedback from the test executor, which indicates if the library accepts or rejects an input, i.e., if it satisfies the input constraints or not. Inputs from different categories are used to invoke the library to check if a set of inputs satisfies a function's input constraints. Inputs in one category are distinguished from other categories by possible input constraints they would satisfy, e.g. they are tensors of a certain shape. As such, SkipFuzz is able to refine its hypothesis by eliminating possible candidates of the input constraints. This active learning-based approach addresses the challenge of redundant inputs. Using SkipFuzz, we have found and reported 43 crashes. 28 of them have been confirmed, with 13 unique CVEs assigned. △ Less

Submitted 7 December, 2022; originally announced December 2022.

Comments: 13 pages

arXiv:2209.03230 [pdf, other]

doi 10.1145/3540250.3549175

AutoPruner: Transformer-Based Call Graph Pruning

Authors: Thanh Le-Cong, Hong Jin Kang, Truong Giang Nguyen, Stefanus Agus Haryono, David Lo, Xuan-Bach D. Le, Huynh Quyet Thang

Abstract: Constructing a static call graph requires trade-offs between soundness and precision. Program analysis techniques for constructing call graphs are unfortunately usually imprecise. To address this problem, researchers have recently proposed call graph pruning empowered by machine learning to post-process call graphs constructed by static analysis. A machine learning model is built to capture inform… ▽ More Constructing a static call graph requires trade-offs between soundness and precision. Program analysis techniques for constructing call graphs are unfortunately usually imprecise. To address this problem, researchers have recently proposed call graph pruning empowered by machine learning to post-process call graphs constructed by static analysis. A machine learning model is built to capture information from the call graph by extracting structural features for use in a random forest classifier. It then removes edges that are predicted to be false positives. Despite the improvements shown by machine learning models, they are still limited as they do not consider the source code semantics and thus often are not able to effectively distinguish true and false positives. In this paper, we present a novel call graph pruning technique, AutoPruner, for eliminating false positives in call graphs via both statistical semantic and structural analysis. Given a call graph constructed by traditional static analysis tools, AutoPruner takes a Transformer-based approach to capture the semantic relationships between the caller and callee functions associated with each edge in the call graph. To do so, AutoPruner fine-tunes a model of code that was pre-trained on a large corpus to represent source code based on descriptions of its semantics. Next, the model is used to extract semantic features from the functions related to each edge in the call graph. AutoPruner uses these semantic features together with the structural features extracted from the call graph to classify each edge via a feed-forward neural network. Our empirical evaluation on a benchmark dataset of real-world programs shows that AutoPruner outperforms the state-of-the-art baselines, improving on F-measure by up to 13% in identifying false-positive edges in a static call graph. △ Less

Submitted 7 September, 2022; originally announced September 2022.

Comments: Accepted to ESEC/FSE 2022, Research Track

arXiv:2012.07259 [pdf, other]

AndroEvolve: Automated Update for Android Deprecated-API Usages

Authors: Stefanus Agus Haryono, Ferdian Thung, David Lo, Lingxiao Jiang, Julia Lawall, Hong Jin Kang, Lucas Serrano, Gilles Muller

Abstract: Android operating system (OS) is often updated, where each new version may involve API deprecation. Usages of deprecated APIs in Android apps need to be updated to ensure the apps' compatibility with the old and new versions of Android OS. In this work, we propose AndroEvolve, an automated tool to update usages of deprecated Android APIs, that addresses the limitations of the state-of-the-art tool… ▽ More Android operating system (OS) is often updated, where each new version may involve API deprecation. Usages of deprecated APIs in Android apps need to be updated to ensure the apps' compatibility with the old and new versions of Android OS. In this work, we propose AndroEvolve, an automated tool to update usages of deprecated Android APIs, that addresses the limitations of the state-of-the-art tool, CocciEvolve. AndroEvolve utilizes data flow analysis to solve the problem of out-of-method-boundary variables, and variable denormalization to remove the temporary variables introduced by CocciEvolve. We evaluated the accuracy of AndroEvolve using a dataset of 360 target files and 20 deprecated Android APIs, where AndroEvolve is able to produce 319 correct updates, compared to CocciEvolve which only produces 249 correct updates. We also evaluated the readability of AndroEvolve's update results using a manual and an automatic evaluation. Both evaluations demonstrated that the code produced by AndroEvolve has higher readability than CocciEvolve's. A video demonstration of AndroEvolve is available at https://youtu.be/siU0tuMITXI. △ Less

Submitted 11 February, 2021; v1 submitted 14 December, 2020; originally announced December 2020.

arXiv:2011.05020 [pdf, other]

AndroEvolve: Automated Android API Update with Data Flow Analysis and Variable Denormalization

Authors: Stefanus A. Haryono, Ferdian Thung, David Lo, Lingxiao Jiang, Julia Lawall, Hong Jin Kang, Lucas Serrano, Gilles Muller

Abstract: The Android operating system is frequently updated, with each version bringing a new set of APIs. New versions may involve API deprecation; Android apps using deprecated APIs need to be updated to ensure the apps' compatibility withold and new versions of Android. Updating deprecated APIs is a time-consuming endeavor. Hence, automating the updates of Android APIs can be beneficial for developers.… ▽ More The Android operating system is frequently updated, with each version bringing a new set of APIs. New versions may involve API deprecation; Android apps using deprecated APIs need to be updated to ensure the apps' compatibility withold and new versions of Android. Updating deprecated APIs is a time-consuming endeavor. Hence, automating the updates of Android APIs can be beneficial for developers. CocciEvolve is the state-of-the-art approach for this automation. However, it has several limitations, including its inability to resolve out-of-method-boundary variables and the low code readability of its update due to the addition of temporary variables. In an attempt to further improve the performance of automated Android API update, we propose an approach named AndroEvolve, which addresses the limitations of CocciEvolve through the addition of data flow analysis and variable name denormalization. Data flow analysis enables AndroEvolve to resolve the value of any variable within the file scope. Variable name denormalization replaces temporary variables that may present in the CocciEvolve update with appropriate values in the target file. We have evaluated the performance of AndroEvolve and the readability of its updates on 360 target files. AndroEvolve produces 26.90% more instances of correct updates compared to CocciEvolve. Moreover, our manual and automated evaluation shows that AndroEvolve updates are more readable than CocciEvolve updates. △ Less

Submitted 10 November, 2020; originally announced November 2020.

arXiv:2011.04962 [pdf, other]

Characterization and Automatic Update of Deprecated Machine-Learning API Usages

Authors: Stefanus Agus Haryono, Ferdian Thung, David Lo, Julia Lawall, Lingxiao Jiang

Abstract: Due to the rise of AI applications, machine learning libraries have become far more accessible, with Python being the most common programming language to write them. Machine learning libraries tend to be updated periodically, which may deprecate existing APIs, making it necessary for developers to update their usages. However, updating usages of deprecated APIs are typically not a priority for dev… ▽ More Due to the rise of AI applications, machine learning libraries have become far more accessible, with Python being the most common programming language to write them. Machine learning libraries tend to be updated periodically, which may deprecate existing APIs, making it necessary for developers to update their usages. However, updating usages of deprecated APIs are typically not a priority for developers, leading to widespread usages of deprecated APIs which expose library users to vulnerability issues. In this paper, we built a tool to automate these updates. We first conducted an empirical study to seek a better understanding on how updates of deprecated machine-learning API usages in Python can be done. The study involved a dataset of 112 deprecated APIs from Scikit-Learn, TensorFlow, and PyTorch. We found dimensions of deprecated API migration related to its update operation (i.e., the required operation to perform the migration), API mapping (i.e., the number of deprecated and its corresponding updated APIs),and context dependency (i.e., whether we need to consider surrounding contexts when performing the migration). Guided by the findings on our empirical study, we created MLCatchUp, a tool to automate the update of Python deprecated API usage that automatically infers the API migration transformation through comparison of the deprecated and updated API signatures. These transformations are expressed in a Domain Specific Language (DSL). We evaluated MLCatchUp using test dataset containing 258 files with 514 API usages that we collected from public GitHub repositories. In this evaluation, MLCatchUp achieves a precision of 86.19%. We further improve the precision of MLCatchUp by adding a feature that allows it to accept additional user input to specify the transformation constraints in the DSL for context-dependent API migration, where MLCatchUp achieves a precision of 93.58%. △ Less

Submitted 10 November, 2020; originally announced November 2020.

arXiv:2005.13220 [pdf, other]

Automatic Android Deprecated-API Usage Update by Learning from Single Updated Example

Authors: Stefanus Agus Haryono, Ferdian Thung, Hong Jin Kang, Lucas Serrano, Gilles Muller, Julia Lawall, David Lo, Lingxiao Jiang

Abstract: Due to the deprecation of APIs in the Android operating system,developers have to update usages of the APIs to ensure that their applications work for both the past and current versions of Android.Such updates may be widespread, non-trivial, and time-consuming. Therefore, automation of such updates will be of great benefit to developers. AppEvolve, which is the state-of-the-art tool for automating… ▽ More Due to the deprecation of APIs in the Android operating system,developers have to update usages of the APIs to ensure that their applications work for both the past and current versions of Android.Such updates may be widespread, non-trivial, and time-consuming. Therefore, automation of such updates will be of great benefit to developers. AppEvolve, which is the state-of-the-art tool for automating such updates, relies on having before- and after-update examples to learn from. In this work, we propose an approach named CocciEvolve that performs such updates using only a single after-update example. CocciEvolve learns edits by extracting the relevant update to a block of code from an after-update example. From preliminary experiments, we find that CocciEvolve can successfully perform 96 out of 112 updates, with a success rate of 85%. △ Less

Submitted 27 May, 2020; originally announced May 2020.

Comments: 5 pages, 8 figures. Accepted in The International Conference on Program Comprehension (ICPC) 2020, ERA Track

ACM Class: I.2.2

Showing 1–6 of 6 results for author: Haryono, S A