Friday, July 8, 2011

Reverse Engineering

One of the things about learning to program, is that like just about anything else it takes effort to learn how to do it well. Reading a book is all very well, but one of the things about reading is that to read well takes effort. One of the things that distinguishes most of the Great Works of programming literature is that they are not easy, and one of the ways that even those works not intended to be used as college texts is the presence of exercises.

The measure of what a programmer says he has read is interesting, but the thing to really do is probe whether they have done the exercises. The Art of Computer Programming is a great set of books but of the all too few people who have read it, even fewer have even glanced at let alone done any of the exercises, which is a shame both because of the lost learning but also because in the great texts the exercises are usually so carefully constructed to take the reader beyond the main text.

[ Of course, back when I was a lad starting out, I didn't have the benefit of ready access to the Great Works (many of which hadn't been written yet, of course). Getting access to a machine was quite enough, at least circa 1980 in New Zealand, hence having to still use card decks to share a "personal" computer. ]

One of the more interesting learning exercises I went through in my early days came via the medium of computer chess, thanks to the classic early chess program Sargon II, by Dan and Kathe Spracklen, immortalized in the classic horror film The Thing as the program Kurt Russell is playing against. My father had taught me the basics of the game, but lacking many other people to play (and having discovered the distraction of computing) I didn't play enough to become any good; when a copy of Sargon II turned up I naturally gave it a whirl, and got a nasty surprise.

If you're feeling keen to get the retro vibe, you can try Sargon in an emulator, although I've linked to Sargon III instead of II as it's rather less forbidding to get started with - one small trick if you use the emulator is that the original Apple ][ had no lower-case. You have to enter moves in algebraic notation (using the board positions) as was a common convention for written chess games, and ESC toggles between the graphical board display and the move list.

Now,  there's nothing particularly earth-shaking about Sargon's play, but although when I could get enough time on an Apple ][ to play it on (which I did; I hate to think how annoying and persistent I must have been) I could beat it handily at lower levels, by difficulty level 3 it was consistently beating me - enough that I found it quite frustrating. Now, at that point I could have just played the program to get better at chess, but I didn't - instead I decided that I needed to figure out how it worked, and derive insight into its underlying playstyle that would be a better (and less painful) learning method than being repeatedly beaten.

The other advantage of studying how the program worked was that unlike playing against it, I didn't actually need time with access to the computer to study the program.

Of course, I've since discovered that the original Sargon was published as a book consisting of a commented Z80 source code listing - a common way of distributing programs in the U.S. back in those days, as exemplified by Compute! magazine - but that book never made its way to our shores. However, there were disassemblers for the 6502; indeed the Apple ][ system ROM contained one, and an even better one by Glen Bredon called Sourceror was a companion to the assembly tool Big Mac which I'd started to play with writing ever more complex 6502 assembly language of my own.

So, I loaded the Sargon II binary into memory, pointed Sourceror at it, and got a rough assembly dump, and then with some persuasion got it printed out into a big stack, which I then carried with me everywhere for a month or so at high school, spending every free moment poring over it trying to trace the program's execution and divine its logic and internal data structures (including any parts of the assembly language dump that were actually data instead of code).

While that printout is long gone, I can at least cherish the memory memory of unwinding it all and discovering how it worked; in essence, the key element - and the part of the program I spent most time trying to figure out, since that meant making suppositions about what the data structures it was using were and then tracing through the code to see if those guesses made sense - was a position evaluator that "scored" a given board layout by scanning it to see which moves were available and thus checking which pieces were threatened with capture. The sum of the weights of the remaining pieces, with weight values chosen to reflect their tactical power, adjusted by the sum of the weights of pieces threatened with capture, gave a simple score for the "desirability" of that position.

[ There was an interesting adjustment to the weights of pieces for pawns; their point weighting changed depending on how many rows across the board they had advanced, to reflect the potential value of the pawn if it could reach the end row and be queened. ]

The difficulty level of the program then controlled how deep into the tree of possible positions from the current one the program would recursively search looking for board layouts, using simple Alpha-Beta pruning to try and constrain its search; the one it found which it scored as the best future outcome at the requested depth would result in the program's choice of move. Although my printout with hundreds of scribbled annotations as I unwound the program's structure is long lost, you can get a sense of it by reading the Sargon I source, along with its original commentary.

I definitely own the Spracklens a debt of gratitude; the fact their chess program was better than I am helped make me a developer by making me mad enough to learn how to pull programs apart from the raw bytes. I might not have put enough effort in to learn how otherwise, but I'm certainly glad I did.

Otherwise, for instance, I might not have gone on to study disassemblies of the Apple ][ BASIC floating-point code (helped by "What's Where in the Apple", which I don't think I have kept although I do have some other old bits of Apple][ trivia) and learned floating-point, and come to a fuller understanding of the Taylor and Maclaurin series expansions - around the same time as this I was teaching myself calculus, but recognising the series expansion by figuring out what the tables of constants represented was an epiphany to treasure.


As I said at the start, all good learning takes effort; and although I never did pursue chess seriously, the effort of reverse-engineering that chess program took quite a bit of time and effort that really paid off. Even though it seems like a frivolous exercise and at the time I certainly had no ambition beyond the exercise itself, it had real practical value. Reverse engineering is actually a powerful skill which I put to good use in the next few years (and indeed, serves me well to this day, albeit with better tools to help the process), first writing my own development tools for embedded systems - the SC/MP and Intel 8051, notably - and then using them to write lots of embedded code and system-level stuff.

No comments:

Post a Comment