๐ŸŽ‰ New Research Published at DSN 2025

I'm excited to announce that "A Human Study of Automatically Generated Decompiler Annotations" has been published at the 2025 IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2025)!

The Research Team

This work represents the culmination of Jeremy Lacomis's Ph.D. research, alongside our fantastic collaborators:

  • Vanderbilt University: Yuwei Yang, Skyler Grandel, and Kevin Leach
  • Carnegie Mellon University: Bogdan Vasilescu and Claire Le Goues

What We Studied

This paper investigates a critical question in reverse engineering: Do automatically generated variable names and type annotations actually help human analysts understand decompiled code?

Our study built upon DIRTY, our machine learning system that automatically generates meaningful variable names and type information for decompiled binaries. While DIRTY showed promising technical results, we wanted to understand its real-world impact on human reverse engineers.

Key Findings

  • Surprisingly, the annotations did not significantly improve participants' task completion speed or accuracy
  • This challenges assumptions about the direct correlation between code readability and task performance
  • Participants preferred code with annotations over plain decompiled output

Read More

Interested in the full methodology and detailed results? Download the complete paper to dive deeper into our human study design, statistical analysis, and implications for future decompilation tools.

Edward J. SchwartzComputer Security Researcher2 min. read

Can existing neural decompiler artifacts be used to run on a new example? Here are some notes on the current state of the art. I assign each decompiler a score from 0 to 10 based on how easy it is to use the publicly available artifacts to run on a new example.

SLaDe: 2/10

SLaDe has a publicly released replication artifact but there are several problems that prevent it from being used on new examples:

  1. The models are trained on assembly code produced from compilers rather than disassemblers. This is probably minor.
  2. More problematically, SLaDe uses IO testcases during beam search to help detect the best candidate. It can be used without these, but the results will be worse. SLaDe does not contain a mechanism for producing testcases for new examples.

Below is a quote from a private conversation with the author:

You are right that IO are somehow used to select in the beam search, in the sense that we report pass@5. They are not strictly required to get the outputs though.

The link you sent is for the program synthesis dataset. In this one, IO generation was programmatic but still kind of manual, I don't think it would be feasible to automatically generate the props file in the general case. For the Github functions, we have a separate repo that automatically generates IO tests, but those are randomly generated and the quality depends on each case. If I had to redo now, I would ask an LLM to generate unit tests! I can give you access to the private repo we used to automatically generate the IO examples for the general case if you wish, but now I'd do it with LLMs rather than randomly.

LLM4Decompile: 9/10

LLM4Decompile has published model files on HuggingFace that can easily be used to run on new examples. I created a few HuggingFace Spaces for testing.

resym: 2/10

resym has a publicly released replication artifact. Unfortunately, as of February 2025, the artifact is missing the "prolog-based inference system for struct layout recovery" which is the key contribution of the paper. Thus it is not possible to run resym on new examples.

DeGPT: 8/10

DeGPT has a publicly released GitHub repository. I'm largely going on memory, but I used it previously on new examples and it was relatively easy to use. I did have to file a few PRs though.

Powered with by Gatsby 5.0