BinSimDB: Benchmark Dataset Construction for Fine-Grained Binary Code Similarity Analysis
Authors:
Fei Zuo,
Cody Tompkins,
Qiang Zeng,
Lannan Luo,
Yung Ryn Choe,
Junghwan Rhee
Abstract:
Binary Code Similarity Analysis (BCSA) has a wide spectrum of applications, including plagiarism detection, vulnerability discovery, and malware analysis, thus drawing significant attention from the security community. However, conventional techniques often face challenges in balancing both accuracy and scalability simultaneously. To overcome these existing problems, a surge of deep learning-based…
▽ More
Binary Code Similarity Analysis (BCSA) has a wide spectrum of applications, including plagiarism detection, vulnerability discovery, and malware analysis, thus drawing significant attention from the security community. However, conventional techniques often face challenges in balancing both accuracy and scalability simultaneously. To overcome these existing problems, a surge of deep learning-based work has been recently proposed. Unfortunately, many researchers still find it extremely difficult to conduct relevant studies or extend existing approaches. First, prior work typically relies on proprietary benchmark without making the entire dataset publicly accessible. Consequently, a large-scale, well-labeled dataset for binary code similarity analysis remains precious and scarce. Moreover, previous work has primarily focused on comparing at the function level, rather than exploring other finer granularities. Therefore, we argue that the lack of a fine-grained dataset for BCSA leaves a critical gap in current research. To address these challenges, we construct a benchmark dataset for fine-grained binary code similarity analysis called BinSimDB, which contains equivalent pairs of smaller binary code snippets, such as basic blocks. Specifically, we propose BMerge and BPair algorithms to bridge the discrepancies between two binary code snippets caused by different optimization levels or platforms. Furthermore, we empirically study the properties of our dataset and evaluate its effectiveness for the BCSA research. The experimental results demonstrate that BinSimDB significantly improves the performance of binary code similarity comparison.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
Intersection Graphs of Rays and Grounded Segments
Authors:
Jean Cardinal,
Stefan Felsner,
Tillmann Miltzow,
Casey Tompkins,
Birgit Vogtenhuber
Abstract:
We consider several classes of intersection graphs of line segments in the plane and prove new equality and separation results between those classes. In particular, we show that: (1) intersection graphs of grounded segments and intersection graphs of downward rays form the same graph class, (2) not every intersection graph of rays is an intersection graph of downward rays, and (3) not every inters…
▽ More
We consider several classes of intersection graphs of line segments in the plane and prove new equality and separation results between those classes. In particular, we show that: (1) intersection graphs of grounded segments and intersection graphs of downward rays form the same graph class, (2) not every intersection graph of rays is an intersection graph of downward rays, and (3) not every intersection graph of rays is an outer segment graph. The first result answers an open problem posed by Cabello and Jejčič. The third result confirms a conjecture by Cabello. We thereby completely elucidate the remaining open questions on the containment relations between these classes of segment graphs. We further characterize the complexity of the recognition problems for the classes of outer segment, grounded segment, and ray intersection graphs. We prove that these recognition problems are complete for the existential theory of the reals. This holds even if a 1-string realization is given as additional input.
△ Less
Submitted 12 December, 2016;
originally announced December 2016.
The Morphisms With Unstackable Image Words
Authors:
C. Robinson Tompkins
Abstract:
In an attempt to classify all of the overlap-free morphisms constructively using the Latin-square morphism, we came across an interesting counterexample, the Leech square-free morphism. We generalize the combinatorial properties of the Leech square-free morphism to gain insights on a larger class of both overlap-free morphisms and square-free morphisms.
In an attempt to classify all of the overlap-free morphisms constructively using the Latin-square morphism, we came across an interesting counterexample, the Leech square-free morphism. We generalize the combinatorial properties of the Leech square-free morphism to gain insights on a larger class of both overlap-free morphisms and square-free morphisms.
△ Less
Submitted 7 June, 2010;
originally announced June 2010.