Edward J. Schwartz: Computer Science Researcher

OOAnalyzer is one of my favorite projects for a few reasons. The underlying problem is deceptively hard. If you look closely enough, OO executables have a lot of evidence in them. When you consider each piece in isolation, it can seem simple. But when you try to combine all of the evidence, you wind up with a combinatorial explosion of possibilities that is pruned by constraints in very complex ways. I also like it because it's a very practical problem. People actually use OOAnalyzer, despite all of its limitations—a testament to how even flawed solutions to hard problems can be valuable.

OOAnalyzer is a successor to an earlier project that predated me at SEI called ObjDigger. OOAnalyzer's big innovation was to use what we called "hypothetical reasoning" about ambiguous scenarios. In a nutshell, we often faced a choice between several possibilities. For example, class D inherits from class B, and we see class M in class D's vftable. We know that M is either a method of D or a method of B. We can't tell which one it is, but we can make a guess, and see if that leads to a contradiction. If it does, then we know our guess was wrong, and we can eliminate that possibility and try the others. This is a powerful technique that was partially enabled by our use of Prolog in OOAnalyzer, specifically Prolog's backtracking search.

That being said, OOAnalyzer's hypothetical reasoning is not perfect. One problem we encountered while developing OOAnalyzer was that there could be a long gap between a guess and when the contradiction is detected. Let's say we make a guess that eventually will cause a contradiction, but we must make 10 other boolean guesses before detecting the contradiction. At that point, Prolog would backtrack the guesses, starting with the most recent guess. Unfortunately, Prolog has no way of knowing that the actual cause was much further back, and it would waste time re-exploring $2^{10}$ models that were doomed to fail. Over time, we began to shape the rules in OOAnalyzer to avoid this problem. We ordered the rules so that any guess likely to lead to a contradiction would be detected immediately. This worked, but also limited our reasoning power.

Another problem with OOAnalyzer is how it decides on a final model. The final model is simply the first one that allows all guesses (uncertainties) to be resolved and does not lead to a contradiction. OOAnalyzer's guessing rules are ordered so that the more important ones are made earlier, when they have less chance of conflicting with a previous decision. But there is no guarantee that this model is the best one, or even close to it! Over time, I formed the opinion that OOAnalyzer is really an optimization problem. Our rules permit multiple models that could explain the evidence. But some are better than others. For example, many OO programs without vftables and vbtables can be explained by a model with no OO classes or methods. This is valid, but not very useful for the analyst.

Finally, OOAnalyzer involves a lot of constraints. For example, we might know that a method is either a constructor, real destructor, or deleting destructor. Obviously if we learn that the same method is not a destructor, it must be a constructor. Because we didn't use constraints in OOAnalyzer, we had to implement this type of logic manually. It works, but it's not elegant.

Over time, I've wondered how to better frame the OOAnalyzer problem, and recently I started exploring Answer Set Programming (ASP) as a potential answer. ASP is a form of declarative programming that is based on the stable model semantics of logic programming. It is designed to solve combinatorial search problems, and it has built-in support for constraints and optimization. ASP has several key features:

It allows for declarative rules, like Prolog.
It has built-in support for constraints. 1 { constructor(X); real_destructor(X); deleting_destructor(X) } 1 means that exactly one of the three predicates must be true for any given X.
It has built-in support for optimization. We can assign weights to different models and ask the solver to find the global optimum, or the best solution found within a time limit.

In my spare time, I've been porting parts of OOAnalyzer to ASP here. I've been pleasantly surprised by how well it has worked so far. The code is much more concise and easier to read than the Prolog version. The constraints are much easier to express. For example, here's a rule that says if we see certain behavior (like installing vftables), we know that the method is a constructor or destructor:

% A method that writes a vftable/vbtable into its own this-pointer must be
% exactly one of: constructor, real destructor, or deleting destructor.
% Covers:
%   reasonConstructor (rules.pl:192) — elimination: only remaining candidate
%   reasonRealDestructor (rules.pl:388) — elimination: only remaining candidate
%   reasonDeletingDestructor (rules.pl:575) — elimination: only remaining candidate
%!trace_rule {"% is exactly one of constructor, real destructor, or deleting destructor", Method}
1 { constructor(Method) ; realDestructor(Method) ; deletingDestructor(Method) } 1 :-
    certainConstructorOrDestructor(Method).

Similarly, here's a rule saying that a method can only be one of a constructor, real destructor, or deleting destructor:

constructorDestructorKind(Method, constructor) :-
    constructor(Method).
constructorDestructorKind(Method, realDestructor) :-
    realDestructor(Method).
constructorDestructorKind(Method, deletingDestructor) :-
    deletingDestructor(Method).

% Covers:
%   insanityConstructorAndRealDestructor (insanity.pl:226)
%   insanityConstructorAndDeletingDestructor (insanity.pl:240)
%   reasonNOTConstructor_B (rules.pl:281) — realDestructor → notConstructor
%   reasonNOTConstructor_C (rules.pl:288) — deletingDestructor → notConstructor
%   reasonNOTRealDestructor_B (rules.pl:443) — constructor → notRealDestructor
%   reasonNOTRealDestructor_C (rules.pl:448) — deletingDestructor → notRealDestructor
%   reasonNOTDeletingDestructor_B (rules.pl:632) — constructor → notDeletingDestructor
%   reasonNOTDeletingDestructor_C (rules.pl:637) — realDestructor → notDeletingDestructor
%!trace_rule {"% has multiple constructor/destructor kinds", Method}
insanity(insanityMultipleConstructorDestructorKinds, (Method,Count)) :-
    method(Method),
    Count = #count { Kind : constructorDestructorKind(Method, Kind) },
    Count > 1.

As the comment above says, this rule covers about 8 different rules in OOAnalyzer. In ASP, we can express the same thing much more concisely.

OOAnalyzer-ASP can already read OOAnalyzer's facts format. You can see some examples of the results in the repository. Bear in mind that not all rules are ported yet.

One last thing I'll share is automatic explanations of models. I've been using the xclingo2 library to generate explanations for the ASP models. Even as OOAnalyzer's developer, it can be hard to understand how it reasons, because it could involve dozens of related conclusions. This is exacerbated in ASP because constraints and the optimization criteria are "silent". But even so, xclingo2 can generate detailed proof trees showing why a particular atom was included in the model. Here's an example of a proof tree for why a method is a constructor:

  |__4266400 is a constructor
  |  |__4266400 is a method;4266400 is a method because 4266176 calls it at this-offset 0
  |  |  |__thunk 4264566 resolves to 4266400
  |  |  |  |__thunk 4266400 resolves to 4266400
  |  |  |__4266176 is a method;4266176 is a method because it is a known constructor
  |  |  |  |__4266176 is a constructor;4266176 is exactly one of constructor, real destructor, or deleting destructor
  |  |  |  |  |__4266176 must be a constructor or destructor because it writes a vftable into its own this-pointer
  |  |  |  |  |  |__4266176 writes confirmed vftable 4290632 at offset 0
  |  |  |  |  |  |  |__4290632 is a confirmed vftable;4290632 is a confirmed vftable because RTTI says so

So is this the next generation OOAnalyzer? Theoretically, I think the answer is yes! But practically, it's unclear how well this is going to scale to real programs. One of the downsides to most ASP implementations is that they ground eagerly. Because executables can have a lot of evidence, it's possible this can lead to a combinatorial explosion in the number of ground rules. Because OOAnalyzer's rules involve negation and recursion in complex ways, it's hard to say how much of an issue this will be. We're able to find the optimal model on all of the toy OOAnalyzer programs, but that doesn't mean much. There are alternative approaches to ASP, like lazy grounding, but they are less mature. I'm hopeful that with cautious engineering, we can make this work on real programs, but only time will tell. In the meantime, I'm having fun exploring this new approach to the problem!

Edward J. Schwartz

Computer Security Researcher