A Taxonomy of C Decompiler Fidelity Issues

Download: PDF.

“A Taxonomy of C Decompiler Fidelity Issues” by Luke Dramko, Jeremy Lacomis, Edward J. Schwartz, Bogdan Vasilescu, and Claire Le Goues. In Proceedings of the USENIX Security Symposium, 2024.

Abstract

Decompilation is an important part of analyzing threats in computer security. Unfortunately, decompiled code contains less information than the corresponding original source code, which makes understanding it more difficult for the reverse engineers who manually perform threat analysis. Thus, the fidelity of decompiled code to the original source code matters, as it can influence reverse engineers' productivity. There is some existing work in predicting some of the missing information using statistical methods, but these focus largely on variable names and variable types. In this work, we more holistically evaluate decompiler output from C-language executables and use our findings to inform directions for future decompiler development. More specifically, we use opencoding techniques to identify defects in decompiled code beyond missing names and types. To ensure that our study is robust, we compare and evaluate four different decompilers. Using thematic analysis, we build a taxonomy of decompiler defects. Using this taxonomy to reason about classes of issues, we suggest specific approaches that can be used to mitigate fidelity issues in decompiled code.

Download: PDF.

BibTeX entry:

@inproceedings{dramko:2024,
   author = {Luke Dramko and Jeremy Lacomis and Edward J. Schwartz and
	Bogdan Vasilescu and Claire Le Goues},
   title = {A Taxonomy of C Decompiler Fidelity Issues},
   booktitle = {Proceedings of the {USENIX} Security Symposium},
   year = {2024}
}

(This webpage was created with bibtex2web.)

Back to publications.