Hello, Merry Christmas and Happy Holidays! I just completed Advent of Code 2023, and wanted to jot down a few reflections before I disappear for the holidays. Like last year, I chose to implement my solutions using Rust.
The good news is that I ran into far fewer problems with Rust this year. There
are a few reasons for this, but a lot of it just boils down to experience. I
tend to program excessively using lazy iterators, and I began to get a second
sense for when I'd need to call .collect()
to avoid referencing a freed
temporary value. Similarly, when I would get a compiler error (and yes, this
still happens a lot!), I would often immediately know how to fix the problem.
That being said, there were still occasions when the language just "got in the
way". I had a few solutions where I had to throw in copious amounts of
.clone()
calls. One salient example was on day
22
when I wanted to memoize a subproblem. But to do so, I also had to pass a value
instead of a reference to each sub-problem, which slowed things down
dramatically. Anyway, with more work, the solution probably would have been to
use references with lifetimes and just ensure that memoization happened during
the right lifetimes. But the key phrase there is with more work. And see
below
-- I was already pretty frustrated.
On this year's Advent of Code, I used Chat-GPT and Co-pilot quite heavily, as I do in practice when coding. I used Chat-GPT mostly to avoid looking up documentation or crates. For example, "How can I add a progress indicator to an iterator in Rust?" or "How can I bind a sub-pattern to a name in Rust?". Given that I hadn't programmed in Rust for a year, I was a bit rusty (ha!)
Co-pilot was also quite useful. It was pretty good at writing out boilerplate code. Advent of Code problems tend to include copious amounts of grid problems, and Co-pilot reduced a lot of the fatigue for helper functions. Define things for the X axis, and it would figure out how to implement the other cases.
I also found that Co-pilot was able to explain weird Rust borrowing problems far
better than the compiler. As I wrote about last year, Rust's borrowing rules
sometimes don't work that great for my programming style, but Co-pilot was
pretty good about explaining problems. I don't have a specific example, but
think of searching a vector using vec.iter()
, using the found value at some
point, and then later trying to move the vector. In my opinion, this definitely
makes it easier to program in Rust.
Now, Co-pilot did introduce some headaches as well. Although it was pretty
good, it would sometimes introduce problematic completions. For instance, it
really liked to complete print statements by adding an extra quote, e.g.,
println!("This is something: {:?}", something");
Not a huge deal, but it
happened enough that it was irritating.
On some level, I think everyone knows this, but it's still worth saying: You are more effective when thinking clearly. I recall from graduate school that many of my fellow students would pull all nighters and submit papers in the wee hours of the morning. I never did that. My efficiency just plummets as I get tired.
I experienced a pretty funny (in retrospect) example of this on day 22. You can read the problem, it's not really that hard. One of the nice things about programming functionally is that it usually works or completely fails. There's not a lot of in-between. In Advent of Code, this means that if your solution works on the example, it usually works on the real input (ignoring performance). But on day 22, my code did not work on the real input. The problem is pretty simple, so it was infuriating. Rather than taking a break to clear my head, I just kept trying to debug it. I stayed up far past my bed time, becoming more and more frustrated. I even completely rewrote my implementation in Python before giving up and going to bed.
The next morning, I woke up and looked at the problem again. I had a simple idea. I actually had two solutions to the problem. I had a simple but slow solution that I was confident was correct, but was too slow to use on the real input. And I had my fast solution that I thought should be correct, but wasn't. My insight was that I could make "in between" problems by just taking a portion of the real input -- say 10%. And then see if my fast solution agreed with the slow one. It didn't. I then ran Lithium and reduced the input to a minimal example. From here, it was trivial to spot the fundamental problem I had in my recursive algorithm. I fixed it, and my solution worked. This whole process probably only took 30 minutes from the time I woke up. I futilely spent hours the previous night. And the worst part is that, even in the moment, I knew I was acting foolishly. I was just too frustrated to stop.
I don't know if this is universal, but when I am stressed or frustrated, I usually make mistakes, which naturally compound my stress and frustration! There is a fine line between perseverance and stubbornness, and I crossed it on day 22.
I've been working on a blog post that has some interesting images in it. Unfortunately, my blog view is narrow, which makes it hard to see these. So, I decided to add a "click for full screen" feature to my blog. How hard could it be?
Answer: Very hard. Sadly, this took me a few days.
I hope to write a more detailed post about this later, but for now, I wanted to show off the fruits of my labor.
I can write:
<fsclick>![A wide picture](wide.jpg)</fsclick>
And you see:
Go ahead, click on it, I dare you! src
Last December, I did most of Advent of Code in Rust, which I had never used before. You can find my solutions here.
I tend to program functionally, perhaps even excessively so. I try to express
most concepts through map
, filter
, and fold
. I tend to enjoy languages
that make this easy. Fortunately, this is becoming the norm, even in
non-functional languages such as Python, Java and C++.
Perhaps it is not too surprising then that Rust, as a new language, supports this style of programming as well:
let x: i32 = (1..42).map(|x| x+1).sum();
println!("x: {x}");
What is truly amazing about Rust though is how this function code is
compiled to x86-64. At optimiation level 1, the computation of x
evaluates to
mov dword ptr [rsp + 4], 902
lea rax, [rsp + 4]
Yes, the compiler is able to unfold and simplify the entire computation, which is pretty neat. But let's look at the code at optimization level 0:
mov ecx, 1
xor eax, eax
mov dl, 1
.LBB5_1:
.Ltmp27:
movzx edx, dl
and edx, 1
add edx, ecx
.Ltmp28:
add eax, ecx
inc eax
mov ecx, edx
.Ltmp29:
cmp edx, 42
setb dl
.Ltmp30:
jb .LBB5_1
.Ltmp31:
sub rsp, 72
mov dword ptr [rsp + 4], eax
lea rax, [rsp + 4]
So our functional computation of a range, a map, and a sum (which is a reduce
)
is compiled into a pretty simple loop. And keep in mind this is at optimization
level 0.
By contrast, let's take a look at how OCaml handles this. First, the included OCaml standard library is not so great, so writing the program is more awkward:
let r = List.init 42 (fun x -> x + 1) in
let x = List.map (fun x -> x+1) r in
let x = List.fold_left (+) 0 x in
Printf.printf "x: %x\n" x
But let's look at the assembly with aggressive optimizations:
camlExample__entry:
leaq -328(%rsp), %r10
cmpq 32(%r14), %r10
jb .L122
.L123:
subq $8, %rsp
.L121:
movl $85, %ebx
movl $5, %eax
call camlExample__init_aux_432@PLT
.L124:
call caml_alloc2@PLT
.L125:
leaq 8(%r15), %rsi
movq $2048, -8(%rsi)
movq $5, (%rsi)
movq %rax, 8(%rsi)
movq camlExample__Pmakeblock_arg_247@GOTPCREL(%rip), %rdi
movq %rsp, %rbp
movq 56(%r14), %rsp
call caml_initialize@PLT
movq %rbp, %rsp
movq camlExample__Pmakeblock_arg_247@GOTPCREL(%rip), %rax
movq (%rax), %rax
call camlExample__map_503@PLT
.L126:
call caml_alloc2@PLT
.L127:
leaq 8(%r15), %rsi
movq $2048, -8(%rsi)
movq $5, (%rsi)
movq %rax, 8(%rsi)
movq camlExample__x_77@GOTPCREL(%rip), %rdi
movq %rsp, %rbp
movq 56(%r14), %rsp
call caml_initialize@PLT
movq %rbp, %rsp
movq camlExample__x_77@GOTPCREL(%rip), %rax
movq (%rax), %rax
movq 8(%rax), %rbx
movl $5, %eax
call camlExample__fold_left_558@PLT
.L128:
movq camlExample__x_75@GOTPCREL(%rip), %rdi
movq %rax, %rsi
movq %rsp, %rbp
movq 56(%r14), %rsp
call caml_initialize@PLT
movq %rbp, %rsp
movq camlExample__const_block_49@GOTPCREL(%rip), %rdi
movq camlExample__Pmakeblock_637@GOTPCREL(%rip), %rbx
movq camlStdlib__Printf__anon_fn$5bprintf$2eml$3a20$2c14$2d$2d48$5d_409_closure@GOTPCREL(%rip), %rax
call camlCamlinternalFormat__make_printf_4967@PLT
.L129:
movq camlExample__full_apply_240@GOTPCREL(%rip), %rdi
movq %rax, %rsi
movq %rsp, %rbp
movq 56(%r14), %rsp
call caml_initialize@PLT
movq %rbp, %rsp
movq camlExample__full_apply_240@GOTPCREL(%rip), %rax
movq (%rax), %rbx
movq camlExample__x_75@GOTPCREL(%rip), %rax
movq (%rax), %rax
movq (%rbx), %rdi
call *%rdi
.L130:
movl $1, %eax
addq $8, %rsp
ret
.L122:
push $34
call caml_call_realloc_stack@PLT
popq %r10
jmp .L123
In general, the Rust compiler's error messages are quite helpful. This is important, because dealing (fighting) with the borrow checker is a frequence occurrence. Unfortunately, there are some cases that, despite a lot of effort, I still don't really understand.
Here is a problem that I posted on stack overflow. It's a bit contrived, but it happened because I had a very functional solution to part 1 of an Advent of Code problem. The easiest way to solve the second part was to add a mutation.
Here is the program:
fn main(){
let v1=vec![1];
let v2=vec![3];
let mut v3=vec![];
v1.iter().map(|x|{
v2.iter().map(|y|{
v3.push(*y);
})
});
}
And here is the error:
error: captured variable cannot escape `FnMut` closure body
--> src/main.rs:6:5
|
4 | let mut v3=vec![];
| ------ variable defined here
5 | v1.iter().map(|x|{
| - inferred to be a `FnMut` closure
6 | / v2.iter().map(|y|{
7 | | v3.push(*y);
| | -- variable captured here
8 | | })
| |______^ returns a reference to a captured variable which escapes the closure body
The suggestions I received on stack overflow were basically "use loops". This was very disappointing for an example where the closures' scopes are clearly limited.
Anyway, it's still early days for Rust, so I hope that problems like this will be improved over time. Overall, it seems like a great language for doing systems development, but I still think a garbage collected language is better for daily driving.
One of the aspects I like about my job as a researcher at SEI is that I split my time between being an academic researcher and a software engineer. For example, a few years ago, my colleagues and I published a paper on how to employ logic programming to recover object-oriented abstractions from an executable. I was very proud of that paper (and still am!) because it introduces some interesting new approaches to performing binary analysis, and yet is also a usable tool (OOAnalyzer) that people are actually using to reverse engineer real software.
Although basing our system on logic programming made for an interesting paper, it also made it challenging to scale OOAnalyzer to very large and complex executables. We strugged from the beginning to achieve reasonable performance, and eventually concluded that we had to aggressively use tabling in order to achieve reasonable performance. At a high level, tabling is memoization for Prolog. Prolog does have side-effects, though, and part of the challenge for tabling implementations is to make sure that the tables are consistent.
One of the leading Prolog implementations with tabling support is XSB Prolog, which OOAnalyzer has been using since the beginning. Teri Swift, one of the XSB developers, and the leading expert on Prolog tabling, really helped us get enough performance to be able to write our paper. We were able to analyze some large and interesting programs, including mysql.exe, the MySQL client program. The largest program we were able to test in our paper was over 5 MiB.
Unfortunately, there were many programs that we weren't able to test successfully as well. One of these that really stuck in my mind was mysqld.exe. In our paper, we were able to run successfully on mysql.exe, the MySQL client program, but mysqld.exe, the daemon was much larger (22 MiB vs 5 MiB), and remained elusive.
A while after we published our paper, we started working again with Teri Swift, who was helping Jan Wielemaker of SWI Prolog add support for tabling to SWI Prolog. Teri and Jan wanted to use OOAnalyzer as a test case to make sure that SWI Prolog's tabling implementation was working correctly. Since this would mean that OOAnalyzer would have another supported Prolog engine, this was a win-win situation for us! Jan graciously helped us get OOAnalyzer running on SWI, which was useful in its own right. He also developed some invaluable debugging tools to help us understand what tabling is doing behind the scenes.
These tools, combined with a lot of profiling and rewriting iterations, eventually allowed us to improve performance bottlenecks to the point that we could successfully reason through mysqld.exe and other similarly large programs! Along the way, we battled performance bottlenecks, memory that would not be reclaimed, and other problems. But now we can analyze mysqld.exe in about 18 hours, and only using about 12 GiB of memory. And what is encouraging is that even near the end of reasoning about such a large program, the tool is still making new inferences fairly quickly.
These updates are available in the latest release of the Pharos repository. My colleague Cory Cohen recently wrote a tutorial on how to run OOAnalyzer on large executables, with all of the tricks we have learned over the years.
While working with Teri and Jan, I very slowly began to understand some of the underlying problems that would cause performance to degrade in large programs. One of the problems was that, to maintain consistency, the tabling implementation would re-compute tables from scratch. For most tables, this could be done quickly. But for others, it would take up huge amounts of time. I also realized that because of the structure of our problem, which incrementally learns new conclusions over time, the recomputations were not really needed.
I created a work-around in Prolog that I call "trigger rules". I rewrote the most expensive OOAnalyzer rules to be triggered when dependent facts were introduced, which avoided the costly (and unnecessary) recomputation of the table for all values. The conversion process is fragile and manual, but it proved that if we could avoid the costly recomputations, OOAnalyzer's performance would scale pretty well.
More recently, we've been working with Jan on some a tabling method that avoids the costly recomputations (and also avoids the fragile, error-prone manual transformations that I needed to make for trigger rules). The new type of tabling is called monotonic tabling. There are still some kinks to work out, but we are optimistic that this will further improve the performance of OOAnalyzer and make it easier to maintain.
And of course, monotonic tabling will probably make for an interesting subject of a new paper. So we've now come full circle. We wrote the research paper on OOAnalyzer. This led us to engineering challenges. And to solve the engineering challenges, it brought us back to a new research problem.
Powered with by Gatsby 5.0