Discovery of Endianness and Instruction Size Characteristics in Binary Programs from Unknown Instruction Set Architectures

Andreassen, Joachim; Morrison, Donn

Abstract:We study the problem of streamlining reverse engineering (RE) of binary programs from unknown instruction set architectures (ISA). We focus on two fundamental ISA characteristics to beginning the RE process: identification of endianness and whether the instruction width is a fixed or variable. For ISAs with a fixed instruction width, we also present methods for estimating the width. In addition to advancing research in software RE, our work can also be seen as a first step in hardware reverse engineering, because endianness and instruction format describe intrinsic characteristics of the underlying ISA. We detail our efforts at feature engineering and perform experiments using a variety of machine learning models on two datasets of architectures using Leave-One-Group-Out-Cross-Validation to simulate conditions where the tested ISA is unknown during model training. We use bigram-based features for endianness detection and the autocorrelation function, commonly used in signal processing applications, for differentiation between fixed- and variable-width instruction sizes. A collection of classifiers from the machine learning library scikit-learn are used in the experiments to research these features. Initial results are promising, with accuracy of endianness detection at 99.4%, fixed- versus variable-width instruction size at 86.0%, and detection of fixed instruction sizes at 88.0%.

Subjects:	Cryptography and Security (cs.CR)
Cite as:	arXiv:2410.21558 [cs.CR]
	(or arXiv:2410.21558v1 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2410.21558

Computer Science > Cryptography and Security

Title:Discovery of Endianness and Instruction Size Characteristics in Binary Programs from Unknown Instruction Set Architectures

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators