Quantifying and Mitigating the Impact of Obfuscations on Machine-Learning-Based Decompilation Improvement

Download: Paper.

“Quantifying and Mitigating the Impact of Obfuscations on Machine-Learning-Based Decompilation Improvement” by Luke Dramko, Deniz Bölöni-Turgut, Claire Le Goues, and Edward Schwartz. In Proceedings of the IEEE Conference on Detection of Intrusions, Malware, and Vulnerability Assessment, 2025.

Abstract

Decompilers are tools that reverse the process of compilation, converting executable binaries into a high-level language like C. They are useful in situations where the original source code is unavailable, such as when analyzing malware, doing vulnerability research, and patching legacy software. Unfortunately, decompilation is necessarily incomplete, because the compiler discards many of the abstractions that make source code readable, like identifier names and types. A large body of existing work uses machine learning to predict missing names, types, and other abstractions in decompiled code. However, little of this work considers obfuscations: semantics-preserving transformations that obscure the functionality and design of a program. At the same time, obfuscations are common in practice, especially in malware. In this work, we perform a quantitative analysis of the impact that obfuscations have on decompiled code. Further, we investigate the degree to which training on obfuscated code mitigates the impact of obfuscations. We perform our experiments on three different models from the literature: DIRTY, HexT5, and VarBERT. We find that obfuscations do negatively impact machine learning models, but training on obfuscations can partially help recover lost accuracy.

Download: Paper.

BibTeX entry:

@inproceedings{dramko:2025:dimva,
   author = {Luke Dramko and Deniz B{\"o}l{\"o}ni-Turgut and Claire Le
	Goues and Edward Schwartz},
   title = {Quantifying and Mitigating the Impact of Obfuscations on
	Machine-Learning-Based Decompilation Improvement},
   booktitle = {Proceedings of the {IEEE} Conference on Detection of
	Intrusions, Malware, and Vulnerability Assessment},
   year = {2025}
}

(This webpage was created with bibtex2web.)

Back to publications.