-
Who is Debugging the Debuggers? Exposing Debug Information Bugs in Optimized Binaries
Authors:
Giuseppe Antonio Di Luna,
Davide Italiano,
Luca Massarelli,
Sebastian Osterlund,
Cristiano Giuffrida,
Leonardo Querzoni
Abstract:
Despite the advancements in software testing, bugs still plague deployed software and result in crashes in production. When debugging issues -- sometimes caused by "heisenbugs" -- there is the need to interpret core dumps and reproduce the issue offline on the same binary deployed. This requires the entire toolchain (compiler, linker, debugger) to correctly generate and use debug information. Litt…
▽ More
Despite the advancements in software testing, bugs still plague deployed software and result in crashes in production. When debugging issues -- sometimes caused by "heisenbugs" -- there is the need to interpret core dumps and reproduce the issue offline on the same binary deployed. This requires the entire toolchain (compiler, linker, debugger) to correctly generate and use debug information. Little attention has been devoted to checking that such information is correctly preserved by modern toolchains' optimization stages. This is particularly important as managing debug information in optimized production binaries is non-trivial, often leading to toolchain bugs that may hinder post-deployment debugging efforts. In this paper, we present Debug$^{2}$, a framework to find debug information bugs in modern toolchains. Our framework feeds random source programs to the target toolchain and surgically compares the debugging behavior of their optimized/unoptimized binary variants. Such differential analysis allows Debug$^{2}$ to check invariants at each debugging step and detect bugs from invariant violations. Our invariants are based on the (in)consistency of common debug entities, such as source lines, stack frames, and function arguments. We show that, while simple, this strategy yields powerful cross-toolchain and cross-language invariants, which can pinpoint several bugs in modern toolchains. We have used Debug$^{2}$ to find 23 bugs in the LLVM toolchain (clang/lldb), 8 bugs in the GNU toolchain (GCC/gdb), and 3 in the Rust toolchain (rustc/lldb) -- with 14 bugs already fixed by the developers.
△ Less
Submitted 4 December, 2020; v1 submitted 27 November, 2020;
originally announced November 2020.
-
In Nomine Function: Naming Functions in Stripped Binaries with Neural Networks
Authors:
Fiorella Artuso,
Giuseppe Antonio Di Luna,
Luca Massarelli,
Leonardo Querzoni
Abstract:
In this paper we investigate the problem of automatically naming pieces of assembly code. Where by naming we mean assigning to an assembly function a string of words that would likely be assigned by a human reverse engineer. We formally and precisely define the framework in which our investigation takes place. That is we define the problem, we provide reasonable justifications for the choices that…
▽ More
In this paper we investigate the problem of automatically naming pieces of assembly code. Where by naming we mean assigning to an assembly function a string of words that would likely be assigned by a human reverse engineer. We formally and precisely define the framework in which our investigation takes place. That is we define the problem, we provide reasonable justifications for the choices that we made for the design of training and the tests. We performed an analysis on a large real-world corpora constituted by nearly 9 millions of functions taken from more than 22k softwares. In such framework we test baselines coming from the field of Natural Language Processing (e.g., Seq2Seq networks and Transformer). Interestingly, our evaluation shows promising results beating the state-of-the-art and reaching good performance. We investigate the applicability of tine-tuning (i.e., taking a model already trained on a large generic corpora and retraining it for a specific task). Such technique is popular and well-known in the NLP field. Our results confirm that fine-tuning is effective even when neural networks are applied to binaries. We show that a model, pre-trained on the aforementioned corpora, when fine-tuned has higher performances on specific domains (such as predicting names in system utilites, malware, etc).
△ Less
Submitted 4 February, 2021; v1 submitted 17 December, 2019;
originally announced December 2019.
-
How Decoding Strategies Affect the Verifiability of Generated Text
Authors:
Luca Massarelli,
Fabio Petroni,
Aleksandra Piktus,
Myle Ott,
Tim Rocktäschel,
Vassilis Plachouras,
Fabrizio Silvestri,
Sebastian Riedel
Abstract:
Recent progress in pre-trained language models led to systems that are able to generate text of an increasingly high quality. While several works have investigated the fluency and grammatical correctness of such models, it is still unclear to which extent the generated text is consistent with factual world knowledge. Here, we go beyond fluency and also investigate the verifiability of text generat…
▽ More
Recent progress in pre-trained language models led to systems that are able to generate text of an increasingly high quality. While several works have investigated the fluency and grammatical correctness of such models, it is still unclear to which extent the generated text is consistent with factual world knowledge. Here, we go beyond fluency and also investigate the verifiability of text generated by state-of-the-art pre-trained language models. A generated sentence is verifiable if it can be corroborated or disproved by Wikipedia, and we find that the verifiability of generated text strongly depends on the decoding strategy. In particular, we discover a tradeoff between factuality (i.e., the ability of generating Wikipedia corroborated text) and repetitiveness. While decoding strategies such as top-k and nucleus sampling lead to less repetitive generations, they also produce less verifiable text. Based on these finding, we introduce a simple and effective decoding strategy which, in comparison to previously used decoding strategies, produces less repetitive and more verifiable text.
△ Less
Submitted 29 September, 2020; v1 submitted 8 November, 2019;
originally announced November 2019.
-
SAFE: Self-Attentive Function Embeddings for Binary Similarity
Authors:
Luca Massarelli,
Giuseppe Antonio Di Luna,
Fabio Petroni,
Leonardo Querzoni,
Roberto Baldoni
Abstract:
The binary similarity problem consists in determining if two functions are similar by only considering their compiled form. Advanced techniques for binary similarity recently gained momentum as they can be applied in several fields, such as copyright disputes, malware analysis, vulnerability detection, etc., and thus have an immediate practical impact. Current solutions compare functions by first…
▽ More
The binary similarity problem consists in determining if two functions are similar by only considering their compiled form. Advanced techniques for binary similarity recently gained momentum as they can be applied in several fields, such as copyright disputes, malware analysis, vulnerability detection, etc., and thus have an immediate practical impact. Current solutions compare functions by first transforming their binary code in multi-dimensional vector representations (embeddings), and then comparing vectors through simple and efficient geometric operations. However, embeddings are usually derived from binary code using manual feature extraction, that may fail in considering important function characteristics, or may consider features that are not important for the binary similarity problem. In this paper we propose SAFE, a novel architecture for the embedding of functions based on a self-attentive neural network. SAFE works directly on disassembled binary functions, does not require manual feature extraction, is computationally more efficient than existing solutions (i.e., it does not incur in the computational overhead of building or manipulating control flow graphs), and is more general as it works on stripped binaries and on multiple architectures. We report the results from a quantitative and qualitative analysis that show how SAFE provides a noticeable performance improvement with respect to previous solutions. Furthermore, we show how clusters of our embedding vectors are closely related to the semantic of the implemented algorithms, paving the way for further interesting applications (e.g. semantic-based binary function search).
△ Less
Submitted 19 December, 2019; v1 submitted 13 November, 2018;
originally announced November 2018.
-
Unsupervised Features Extraction for Binary Similarity Using Graph Embedding Neural Networks
Authors:
Roberto Baldoni,
Giuseppe Antonio Di Luna,
Luca Massarelli,
Fabio Petroni,
Leonardo Querzoni
Abstract:
In this paper we consider the binary similarity problem that consists in determining if two binary functions are similar only considering their compiled form. This problem is know to be crucial in several application scenarios, such as copyright disputes, malware analysis, vulnerability detection, etc. The current state-of-the-art solutions in this field work by creating an embedding model that ma…
▽ More
In this paper we consider the binary similarity problem that consists in determining if two binary functions are similar only considering their compiled form. This problem is know to be crucial in several application scenarios, such as copyright disputes, malware analysis, vulnerability detection, etc. The current state-of-the-art solutions in this field work by creating an embedding model that maps binary functions into vectors in $\mathbb{R}^{n}$. Such embedding model captures syntactic and semantic similarity between binaries, i.e., similar binary functions are mapped to points that are close in the vector space. This strategy has many advantages, one of them is the possibility to precompute embeddings of several binary functions, and then compare them with simple geometric operations (e.g., dot product). In [32] functions are first transformed in Annotated Control Flow Graphs (ACFGs) constituted by manually engineered features and then graphs are embedded into vectors using a deep neural network architecture. In this paper we propose and test several ways to compute annotated control flow graphs that use unsupervised approaches for feature learning, without incurring a human bias. Our methods are inspired after techniques used in the natural language processing community (e.g., we use word2vec to encode assembly instructions). We show that our approach is indeed successful, and it leads to better performance than previous state-of-the-art solutions. Furthermore, we report on a qualitative analysis of functions embeddings. We found interesting cases in which embeddings are clustered according to the semantic of the original binary function.
△ Less
Submitted 13 November, 2018; v1 submitted 23 October, 2018;
originally announced October 2018.
-
Android Malware Family Classification Based on Resource Consumption over Time
Authors:
Luca Massarelli,
Leonardo Aniello,
Claudio Ciccotelli,
Leonardo Querzoni,
Daniele Ucci,
Roberto Baldoni
Abstract:
The vast majority of today's mobile malware targets Android devices. This has pushed the research effort in Android malware analysis in the last years. An important task of malware analysis is the classification of malware samples into known families. Static malware analysis is known to fall short against techniques that change static characteristics of the malware (e.g. code obfuscation), while d…
▽ More
The vast majority of today's mobile malware targets Android devices. This has pushed the research effort in Android malware analysis in the last years. An important task of malware analysis is the classification of malware samples into known families. Static malware analysis is known to fall short against techniques that change static characteristics of the malware (e.g. code obfuscation), while dynamic analysis has proven effective against such techniques. To the best of our knowledge, the most notable work on Android malware family classification purely based on dynamic analysis is DroidScribe. With respect to DroidScribe, our approach is easier to reproduce. Our methodology only employs publicly available tools, does not require any modification to the emulated environment or Android OS, and can collect data from physical devices. The latter is a key factor, since modern mobile malware can detect the emulated environment and hide their malicious behavior. Our approach relies on resource consumption metrics available from the proc file system. Features are extracted through detrended fluctuation analysis and correlation. Finally, a SVM is employed to classify malware into families. We provide an experimental evaluation on malware samples from the Drebin dataset, where we obtain a classification accuracy of 82%, proving that our methodology achieves an accuracy comparable to that of DroidScribe. Furthermore, we make the software we developed publicly available, to ease the reproducibility of our results.
△ Less
Submitted 4 September, 2017;
originally announced September 2017.