blog image
Edward J. SchwartzComputer Security Researcher1 min. read

My family and I had quite the day attending the 2023 Central Pennsylvania Open Source Conference (CPOSC) yesterday at the Ware Center in Lancaster, PA. In addition to attending, we spoke in three different talks!

My wife, Dr. Stephanie Schwartz, kicked things off.

Stephanie
presenting
Stephanie presenting
She gave a short introduction on Random Forests as an introduction for her student Samantha Noggle, in "Machine Learning Techniques to Improve Users' Music Listening Experiences". I thought Samantha did a great job on her presentation. It was an interesting topic that she presented at just the right level of detail.

Next, my step son Nick Elzer, along with his coworker from Quub, Nathaniel Every, gave his first ever public conference presentation, and did a great job!

Nick and Nathaniel
presenting
Nick and Nathaniel presenting
Their presentation was "SpaceHeX Beta1.0 Release - What If, For Space Hardware Development, We Put Each Egg In Its Own Basket?", and was about SpaceHex, a prototyping system they open sourced to make it easier for people to break into satellite development. They also demoed a pretty sweet omnidirectional rover that they built (at 2am the night before, naturally).

Finally, I gave a tutorial presentation called "Introduction to Exploiting Stack Buffer Overflow Vulnerabilities". If you want, you can follow along here and the videos below.

Me speaking
Me speaking

After a few years of hiatus, it was great to have CPOSC 2023 be back in person again. I saw a number of great presentations, and met a lot of interesting and smart people. It's always surprising how many technical people work in an area that is known for its rural farming!

Videos

Edward J. SchwartzComputer Security Researcher1 min. read

Variable Recovery

As far back as I can remember, one of the accepted dogmas of reverse engineering is that when a program is compiled, some information about the program is lost, and there is no way to recover it. Variable names are one of the most frequently cited casualties of this idea. In fact, this argument is used by countless authors in the introductions of their papers to explain some of the unique challenges that binary analysis has compared to source analysis.

You can imagine how shocking (and cool!) it was then when my colleagues and I found that it's possible to recover a large percentage of variable names. Our key insight is that the semantics of the binary code that accesses a variable is actually a surprisingly good signal for what the variable was named. In other words, we took a huge amount of source code from github, compiled it, and then decompiled it. We then trained a model to predict the variable names, which we can't see in an executable, based on the way that the code accesses those variables, which we can see in an executable. In Meaningful Variable Names for Decompiled Code: A Machine Translation Approach, we showed that this was possible using Statistical Machine Translation, which is one of the techniques used to translate between natural languages.

I'm happy to announce that our latest paper on this subject, DIRE: A Neural Approach to Decompiled Identifier Renaming, was accepted to ASE 2019. In this paper, we found that neural network models work really well for recovering variable names in decompiled code. I'll post our camera ready as soon as its finished.

Code Reuse

I'll be speaking at the 2019 Central Pennsylvania Open Source Conference (CPOSC) in September. I attended CPOSC for the first time last year, and was very impressed by the quality of the talks. The name is a little misleading; talks are not necessarily related to open source software. I'll actually be giving a primer on code reuse attacks.

Powered with by Gatsby 5.0