-
Gas Gauge: A Security Analysis Tool for Smart Contract Out-of-Gas Vulnerabilities
Authors:
Behkish Nassirzadeh,
Huaiying Sun,
Sebastian Banescu,
Vijay Ganesh
Abstract:
In recent years we have witnessed a dramatic increase in the adoption and application of smart contracts in a variety of contexts such as decentralized finance, supply chain management, and identity management. However, a critical stumbling block to the further adoption of smart contracts is their security. A particularly widespread class of security vulnerabilities that afflicts Ethereum smart co…
▽ More
In recent years we have witnessed a dramatic increase in the adoption and application of smart contracts in a variety of contexts such as decentralized finance, supply chain management, and identity management. However, a critical stumbling block to the further adoption of smart contracts is their security. A particularly widespread class of security vulnerabilities that afflicts Ethereum smart contracts is the gas limit denial of service(DoS) on a contract via unbounded operations. These vulnerabilities result in a failed transaction with an out-of-gas error and are often present in contracts containing loops whose bounds are affected by end-user input. Note that such vulnerabilities differ from gas limit DoS on the network via block stuffing. Therefore, we present Gas Gauge, a tool aimed at detecting Out-of-Gas DoS vulnerabilities in Ethereum smart contracts. Gas Gauge consists of three major components: the Detection, Identification, and Correction Phases. The Detection Phase consists of an accurate static analysis approach that finds and summarizes all the loops in a smart contract. The Identification Phase uses a white-box fuzzing approach to generate a set of inputs that causes the contract to run out of gas. The Correction Phase uses static analysis and run-time verification to predict the maximum loop bounds consistent with allowable gas usage and suggest appropriate repairs to the user of the tool. Each part of the tool can be used separately for different purposes or all together to detect, identify and help repair the contracts vulnerable to Out-of-Gas DoS vulnerabilities. Gas Gauge was tested on 1,000 real-world solidity smart contracts deployed on the Ethereum Mainnet. The results were compared to seven state-of-the-art static and symbolic tools, and it was empirically demonstrated that Gas Gauge is far more effective than competing state-of-the-art tools.
△ Less
Submitted 9 July, 2022; v1 submitted 28 December, 2021;
originally announced December 2021.
-
Maat: Automatically Analyzing VirusTotal for Accurate Labeling and Effective Malware Detection
Authors:
Aleieldin Salem,
Sebastian Banescu,
Alexander Pretschner
Abstract:
The malware analysis and detection research community relies on the online platform VirusTotal to label Android apps based on the scan results of around 60 antiviral scanners. Unfortunately, there are no standards on how to best interpret the scan results acquired from VirusTotal, which leads to the utilization of different threshold-based labeling strategies (e.g., if ten or more scanners deem an…
▽ More
The malware analysis and detection research community relies on the online platform VirusTotal to label Android apps based on the scan results of around 60 antiviral scanners. Unfortunately, there are no standards on how to best interpret the scan results acquired from VirusTotal, which leads to the utilization of different threshold-based labeling strategies (e.g., if ten or more scanners deem an app malicious, it is considered malicious). While some of the utilized thresholds may be able to accurately approximate the ground truths of apps, the fact that VirusTotal changes the set and versions of the scanners it uses makes such thresholds unsustainable over time. We implemented a method, Maat, that tackles these issues of standardization and sustainability by automatically generating a Machine Learning (ML)-based labeling scheme, which outperforms threshold-based labeling strategies. Using the VirusTotal scan reports of 53K Android apps that span one year, we evaluated the applicability of Maat's ML-based labeling strategies by comparing their performance against threshold-based strategies. We found that such ML-based strategies (a) can accurately and consistently label apps based on their VirusTotal scan reports, and (b) contribute to training ML-based detection methods that are more effective at classifying out-of-sample apps than their threshold-based counterparts.
△ Less
Submitted 1 July, 2020;
originally announced July 2020.
-
MPro: Combining Static and Symbolic Analysis for Scalable Testing of Smart Contract
Authors:
William Zhang,
Sebastian Banescu,
Leonardo Passos,
Steven Stewart,
Vijay Ganesh
Abstract:
Smart contracts are executable programs that enable the building of a programmable trust mechanism between multiple entities without the need of a trusted third-party. Researchers have developed several security scanners in the past couple of years. However, many of these analyzers either do not scale well, or if they do, produce many false positives. This issue is exacerbated when bugs are trigge…
▽ More
Smart contracts are executable programs that enable the building of a programmable trust mechanism between multiple entities without the need of a trusted third-party. Researchers have developed several security scanners in the past couple of years. However, many of these analyzers either do not scale well, or if they do, produce many false positives. This issue is exacerbated when bugs are triggered only after a series of interactions with the functions of the contract-under-test. A depth-n vulnerability, refers to a vulnerability that requires invoking a specific sequence of n functions to trigger. Depth-n vulnerabilities are time-consuming to detect by existing automated analyzers, because of the combinatorial explosion of sequences of functions that could be executed on smart contracts.
In this paper, we present a technique to analyze depth-n vulnerabilities in an efficient and scalable way by combining symbolic execution and data dependency analysis. A significant advantage of combining symbolic with static analysis is that it scales much better than symbolic alone and does not have the problem of false positive that static analysis tools typically have. We have implemented our technique in a tool called MPro, a scalable and automated smart contract analyzer based on the existing symbolic analysis tool Mythril-Classic and the static analysis tool Slither. We analyzed 100 randomly chosen smart contracts on MPro and our evaluation shows that MPro is about n-times faster than Mythril-Classic for detecting depth-n vulnerabilities, while preserving all the detection capabilities of Mythril-Classic.
△ Less
Submitted 14 December, 2020; v1 submitted 1 November, 2019;
originally announced November 2019.
-
VirtSC: Combining Virtualization Obfuscation with Self-Checksumming
Authors:
Mohsen Ahmadvand,
Daniel Below,
Sebastian Banescu,
Alexander Pretschner
Abstract:
Self-checksumming (SC) is a tamper-proofing technique that ensures certain program segments (code) in memory hash to known values at runtime. SC has few restrictions on application and hence can protect a vast majority of programs. The code verification in SC requires computation of the expected hashes after compilation, as the machine-code is not known before. This means the expected hash values…
▽ More
Self-checksumming (SC) is a tamper-proofing technique that ensures certain program segments (code) in memory hash to known values at runtime. SC has few restrictions on application and hence can protect a vast majority of programs. The code verification in SC requires computation of the expected hashes after compilation, as the machine-code is not known before. This means the expected hash values need to be adjusted in the binary executable, hence combining SC with other protections is limited due to this adjustment step. However, obfuscation protections are often necessary, as SC protections can be otherwise easily detected and disabled via pattern matching. In this paper, we present a layered protection using virtualization obfuscation, yielding an architecture-agnostic SC protection that requires no post-compilation adjustment. We evaluate the performance of our scheme using a dataset of 25 real-world programs (MiBench and 3 CLI games). Our results show that the SC scheme induces an average overhead of 43% for a complete protection (100% coverage). The overhead is tolerable for less CPU-intensive programs (e.g. games) and when only parts of programs (e.g. license checking) are protected. However, large overheads stemming from the virtualization obfuscation were encountered.
△ Less
Submitted 25 September, 2019;
originally announced September 2019.
-
SIP Shaker: Software Integrity Protection Composition
Authors:
Mohsen Ahmadvand,
Dennis Fischer,
Sebastian Banescu
Abstract:
Man-At-The-End (MATE) attackers are almighty adversaries against whom there exists no silver-bullet countermeasure. To raise the bar, a wide range of protection measures were proposed in the literature each of which adds resilience against certain attacks on certain digital assets of a program. Intuitively, composing a set of protections (rather than applying just one of them) can mitigate a wider…
▽ More
Man-At-The-End (MATE) attackers are almighty adversaries against whom there exists no silver-bullet countermeasure. To raise the bar, a wide range of protection measures were proposed in the literature each of which adds resilience against certain attacks on certain digital assets of a program. Intuitively, composing a set of protections (rather than applying just one of them) can mitigate a wider range of attacks and hence offer a higher level of security. Despite the potential benefits, very limited research has been done on the composition of protections. Naive compositions could lead to conflicts which, in turn, limit the application of protections, raise false alarms, and worse yet, yield corrupted binaries. More importantly, inadequate compositions of such protections are not tailored for the program at hand and thus the offered security and performance are sub-optimal. In this paper, we first lay out a set of generic constraints for a conflict-free composition of protections. Then, we develop a composition framework based on a defense graph in which nodes and edges capture protections, their relations, and constraints. The conflicts problem together with optimization requirements are then translated into a set of integer constraints. We then use Integer Linear Programming (ILP) to handle conflicts while optimizing for a higher security and lower overhead. To measure the overhead, we use a set of real-world programs (MiBench dataset and open source games). Our evaluation results indicate that our composition framework reduces the overhead by $\approx$ 39% while maximizing the coverage. Moreover, our approach yields a 5-fold decrease in overhead compared to state-of-the-art heuristics.
△ Less
Submitted 25 September, 2019;
originally announced September 2019.
-
Don't Pick the Cherry: An Evaluation Methodology for Android Malware Detection Methods
Authors:
Aleieldin Salem,
Sebastian Banescu,
Alexander Pretschner
Abstract:
In evaluating detection methods, the malware research community relies on scan results obtained from online platforms such as VirusTotal. Nevertheless, given the lack of standards on how to interpret the obtained data to label apps, researchers hinge on their intuitions and adopt different labeling schemes. The dynamicity of VirusTotal's results along with adoption of different labeling schemes si…
▽ More
In evaluating detection methods, the malware research community relies on scan results obtained from online platforms such as VirusTotal. Nevertheless, given the lack of standards on how to interpret the obtained data to label apps, researchers hinge on their intuitions and adopt different labeling schemes. The dynamicity of VirusTotal's results along with adoption of different labeling schemes significantly affect the accuracies achieved by any given detection method even on the same dataset, which gives subjective views on the method's performance and hinders the comparison of different malware detection techniques.
In this paper, we demonstrate the effect of varying (1) time, (2) labeling schemes, and (3) attack scenarios on the performance of an ensemble of Android repackaged malware detection methods, called dejavu, using over 30,000 real-world Android apps. Our results vividly show the impact of varying the aforementioned 3 dimensions on dejavu's performance. With such results, we encourage the adoption of a standard methodology that takes into account those 3 dimensions in evaluating newly-devised methods to detect Android (repackaged) malware.
△ Less
Submitted 25 March, 2019;
originally announced March 2019.
-
Identifying Relevant Information Cues for Vulnerability Assessment Using CVSS
Authors:
Luca Allodi,
Sebastian Banescu,
Henning Femmer,
Kristian Beckers
Abstract:
The assessment of new vulnerabilities is an activity that accounts for information from several data sources and produces a `severity' score for the vulnerability. The Common Vulnerability Scoring System (\CVSS) is the reference standard for this assessment. Yet, no guidance currently exists on \emph{which information} aids a correct assessment and should therefore be considered.
In this paper w…
▽ More
The assessment of new vulnerabilities is an activity that accounts for information from several data sources and produces a `severity' score for the vulnerability. The Common Vulnerability Scoring System (\CVSS) is the reference standard for this assessment. Yet, no guidance currently exists on \emph{which information} aids a correct assessment and should therefore be considered.
In this paper we address this problem by evaluating which information cues increase (or decrease) assessment accuracy.
We devise a block design experiment with 67 software engineering students with varying vulnerability information and measure scoring accuracy under different information sets.
We find that baseline vulnerability descriptions provided by standard vulnerability sources provide only part of the information needed to achieve an accurate vulnerability assessment. Further, we find that additional information on \texttt{assets}, \texttt{attacks}, and \texttt{vulnerability type} contributes in increasing the accuracy of the assessment; conversely, information on \texttt{known threats} misleads the assessor and decreases assessment accuracy and should be avoided when assessing vulnerabilities. These results go in the direction of formalizing the vulnerability communication to, for example, fully automate security assessments.
△ Less
Submitted 20 March, 2018;
originally announced March 2018.
-
Reasoning about Probabilistic Defense Mechanisms against Remote Attacks
Authors:
Martín Ochoa,
Sebastian Banescu,
Cynthia Disenfeld,
Gilles Barthe,
Vijay Ganesh
Abstract:
Despite numerous countermeasures proposed by practitioners and researchers, remote control-flow alteration of programs with memory-safety vulnerabilities continues to be a realistic threat. Guaranteeing that complex software is completely free of memory-safety vulnerabilities is extremely expensive. Probabilistic countermeasures that depend on random secret keys are interesting, because they are a…
▽ More
Despite numerous countermeasures proposed by practitioners and researchers, remote control-flow alteration of programs with memory-safety vulnerabilities continues to be a realistic threat. Guaranteeing that complex software is completely free of memory-safety vulnerabilities is extremely expensive. Probabilistic countermeasures that depend on random secret keys are interesting, because they are an inexpensive way to raise the bar for attackers who aim to exploit memory-safety vulnerabilities. Moreover, some countermeasures even support legacy systems. However, it is unclear how to quantify and compare the effectiveness of different probabilistic countermeasures or combinations of such countermeasures. In this paper we propose a methodology to rigorously derive security bounds for probabilistic countermeasures. We argue that by representing security notions in this setting as events in probabilistic games, similarly as done with cryptographic security definitions, concrete and asymptotic guarantees can be obtained against realistic attackers. These guarantees shed light on the effectiveness of single countermeasures and their composition and allow practitioners to more precisely gauge the risk of an attack.
△ Less
Submitted 17 February, 2017; v1 submitted 24 January, 2017;
originally announced January 2017.
-
The Meaning of Attack-Resistant Systems
Authors:
Vijay Ganesh,
Sebastian Banescu,
Martín Ochoa
Abstract:
In this paper, we introduce a formal notion of partial compliance, called Attack-resistance, of a computer program running together with a defense mechanism w.r.t a non-exploitability specification. In our setting, a program may contain exploitable vulnerabilities, such as buffer overflows, but appropriate defense mechanisms built into the program or the operating system render such vulnerabilitie…
▽ More
In this paper, we introduce a formal notion of partial compliance, called Attack-resistance, of a computer program running together with a defense mechanism w.r.t a non-exploitability specification. In our setting, a program may contain exploitable vulnerabilities, such as buffer overflows, but appropriate defense mechanisms built into the program or the operating system render such vulnerabilities hard to exploit by certain attackers, usually relying on the strength of the randomness of a probabilistic transformation of the environment or the program and some knowledge on the attacker's goals and attack strategy. We are motivated by the reality that most large-scale programs have vulnerabilities despite our best efforts to get rid of them. Security researchers have responded to this state of affairs by coming up with ingenious defense mechanisms such as address space layout randomization (ASLR) or instruction set randomization (ISR) that provide some protection against exploitation. However, implementations of such mechanism have been often shown to be insecure, even against the attacks they were designed to prevent. By formalizing this notion of attack-resistance we pave the way towards addressing the questions: "How do we formally analyze defense mechanisms? Is there a mathematical way of distinguishing effective defense mechanisms from ineffective ones? Can we quantify and show that these defense mechanisms provide formal security guarantees, albeit partial, even in the presence of exploitable vulnerabilities?". To illustrate our approach we discuss under which circumstances ISR implementations comply with the Attack-resistance definition.
△ Less
Submitted 11 June, 2015; v1 submitted 13 February, 2015;
originally announced February 2015.
-
FEEBO: An Empirical Evaluation Framework for Malware Behavior Obfuscation
Authors:
Sebastian Banescu,
Tobias Wüchner,
Marius Guggenmos,
Martín Ochoa,
Alexander Pretschner
Abstract:
Program obfuscation is increasingly popular among malware creators. Objectively comparing different malware detection approaches with respect to their resilience against obfuscation is challenging. To the best of our knowledge, there is no common empirical framework for evaluating the resilience of malware detection approaches w.r.t. behavior obfuscation. We propose and implement such a framework…
▽ More
Program obfuscation is increasingly popular among malware creators. Objectively comparing different malware detection approaches with respect to their resilience against obfuscation is challenging. To the best of our knowledge, there is no common empirical framework for evaluating the resilience of malware detection approaches w.r.t. behavior obfuscation. We propose and implement such a framework that obfuscates the observable behavior of malware binaries. To assess the framework's utility, we use it to obfuscate known malware binaries and then investigate the impact on detection effectiveness of different $n$-gram based detection approaches. We find that the obfuscation transformations employed by our framework significantly affect the precision of such detection approaches. Several $n$-gram-based approaches can hence be concluded not to be resilient against this simple kind of obfuscation.
△ Less
Submitted 13 February, 2015; v1 submitted 11 February, 2015;
originally announced February 2015.