Sunday, July 10, 2011

The Byte Shop

In 1981, the first retail computer store in New Zealand opened in downtown Auckland, in Fort Street just off Queen Street; called the Byte Shop, it was run by a wonderfully enthusiastic man, the late Andrew Tearle, whose death I only just learned of which preparing to write this post.

When I was 14, I often took the bus from Glenfield College to Auckland after high school on a Friday; most of the time, my main aim was to visit the then brand-new video arcades and play a little Pinball or Galaxian. Once the Byte Shop opened though, it became a mecca for a number of young nerds like myself to gawk at the computers and the programming books.

The first couple of undergraduate programming texts I bought came from there; Fundamentals of Interactive Computer Graphics (by Newman and Sproull) came from the Byte Shop, as did Sorting and Sort Systems, by Harold Lorin, from which I learned the wonders of things like the Polyphase Merge Sort (extremely important on machines of those day, where 16Kb was still considered a lot).

Andy himself was a genuinely friendly guy, and very tolerant of we computer-obsessed youngsters using his store as a hangout to discuss programming, and I hope it worked for him having us around to help answer some of the curlier technical questions customers had about the machines.

Lots of great meetings happened there; one of the most important for me was a young man named Justin Farrelly, and while chatting we found out that he was an Air Force avionics engineer based at 5 Squadron in Hobsonville, meaning he was only a couple of kilometers away from where I lived with my father in Greenhithe just on the other side of the harbour across a bridge.

Justin and I became firm friends and worked together for many years; it was a wonderful partnership, with Justin doing electronics design and me writing software, and around the same time I finished high school Justin left the Air Force and went into business for himself with me coding for him. We did a wonderfully diverse set of embedded projects, mostly based on the Intel 8051; later on Justin took pilot training and learned to fly helicopters, and went down to Antarctica to winter over with Greenpeace to help them get access to the Antarctic treaty negotiations: in this image on the Greenpeace site, Justin is the chap on the left smiling at the camera in a blue coat.

After that Justin went all over the world and we didn't work together again, but those years from 1981 through the late 80's were incredibly productive and fun. And without Andrew Tearle at the Byte Shop, that wouldn't have happened.

Saturday, July 9, 2011

The Turbo C Overlay Manager

During the late 80's I was doing most of my work - writing point-of-sale software for the retail oil industry - in a mixture of assembly language and Turbo C, rewriting my earlier point-of-sale system (written in Turbo Pascal on the Sanyo MBC 550 not-PC-compatible) which used a cooperative multitasking system in favour of one with a proper preemptive task switcher.

Even more fun, this little doodad took advantage of the technique of coercing DOS into supporting multitasking by using the INDOS flag and the very under-documented fact that the MS-DOS kernel thoughtfully stored all the key state for the current program in a contiguous block of memory. With the address and size of this block, as long as the MS-DOS kernel wasn't busy doing I/O at the time, task switches could be done by not just saving the CPU state and switching stacks as per normal, but also saving and restoring the MS-DOS state in the "swappable data area".

With two monitors, the resulting point-of-sale system would run as a TSR, typically driving a monochrome adapter running in the background, running several serial ports (with its own keyboard) and printers while the rest of the machine was available to run a back-office program on the colour display adaptor. The next elaboration was to use the EGA adapter, but to split each colour plane out and use it to drive a separate green screen, since the task system would happily run several instances of the main point-of-sale task so we could run up to three point-of-sale registers from the same PC, which could still run the accounting software on the other display adapter.

While this worked well, as features were added to the point of sale the memory pressure in real mode started to grow, so I looked into one of the fancy features of Turbo C, what Borland called the VROOOM overlay manager.

Overlays were a crude technique in the early days; typically, functions in a Pascal program would be designated as "overlay" routines, in which case they were split out from the main executable into a separate piece managed by library code in the main routine. Before calling an overlay routine, the caller would ensure that the right overlay was loaded in first so that the code for the routine was resident in memory.

All well and good, but with one drawback; since the memory used to load each of the routines in the "overlay" sections was overwritten as different code was loaded in, if an overlaid routine called a non-overlaid routine, which then called a different overlay, things didn't work out so well.

VROOM was more clever than that, though. What some clever folk at Borland had done was a neat trick to allow routines in overlays to work safely. I believe the same fundamental idea was also picked up by real-mode Windows to manage code sections, since real-mode Windows was under really extreme memory pressure, pretty much all application code needed to be swappable which resulted in similar problems.

With some persistence and reverse-engineering, it turned out that the overlay support was mediated by the linker which glued together the final program; the linker then emitted a small thunk for each entry point in any object module which had been tagged on the linker command line for overlaying, and this thunk took care of the transparent code loading of the right overlay segment on function entry.

However, it cooperated with any code higher up in the call stack cleverly; since Borland's compiler used a consistent frame format, the overlay library walked up through the stack looking at code segment addresses which matched the overlay area. If it found one of these call records on the stack it would rewrite it, extracting the original return address and replacing it. Finally, it would then task that stack frame as one it had modified, by adjusting the saved frame pointer to set the low bit (which for normal frames would always be 0, as the 8086 stack is always 16-bit aligned).

This meant that when the flow of control eventually tried to return to some code in an evicted overlay, the rewritten return address would be called instead, which went into a thunk in the overlay manager which would reload the correct overlay segment and fix the stack back up so it was as it was originally before continuing execution.

This made the overlay process pretty transparent, and reasonably reliable, although since it worked at a linker level it would sometimes take some care ensuring that the right routines were bundled together so the call hotspots didn't try thrashing overlay loads.

One remaining problem with it though, was that it still needed a lot of space for the overlay region itself (as much as the biggest segment it might load, which in practice for a program big enough to need overlaying, typically meant 64kb). That was a particular problem for our point-of-sale TSR.

So, having reverse-engineered this I decided to get creative and write my own overlay manager, which took things to the next level by putting the overlay in an unusual place: the Expanded Memory System page frame. My replacement overlay manager preloaded all the overlays into Expanded Memory areas, and then handled the overlay loads by remapping the EMS page frame to the right one for both zippy performance and no additional burden on MS-DOS real memory.

And since I had a preemptive multitasker which was already managing the DOS state switching, I then expanded that to track the EMS page frame state as well, so that my application which was running as a TSR could run concurrently with the main program even if the main program was using EMS as well. Hey presto, we could then run our point-of-sale in the background with something like 96kb of real memory load; 32k of core C library and overlay manager plus a 64kb data segment for the point-of-sale task instance.

And since one of the things I wanted to do around that time was write my own UNIX kernel for a 68k board to replace all this DOS malarkey, when I was working on that I could use this technique to help get GCC running in real-mode MS-DOS, although even with the overlay manager running out of the page frame, GCC still wouldn't fit. Fortunately, that too turned out to be a solvable problem....

Friday, July 8, 2011

Reverse Engineering

One of the things about learning to program, is that like just about anything else it takes effort to learn how to do it well. Reading a book is all very well, but one of the things about reading is that to read well takes effort. One of the things that distinguishes most of the Great Works of programming literature is that they are not easy, and one of the ways that even those works not intended to be used as college texts is the presence of exercises.

The measure of what a programmer says he has read is interesting, but the thing to really do is probe whether they have done the exercises. The Art of Computer Programming is a great set of books but of the all too few people who have read it, even fewer have even glanced at let alone done any of the exercises, which is a shame both because of the lost learning but also because in the great texts the exercises are usually so carefully constructed to take the reader beyond the main text.

[ Of course, back when I was a lad starting out, I didn't have the benefit of ready access to the Great Works (many of which hadn't been written yet, of course). Getting access to a machine was quite enough, at least circa 1980 in New Zealand, hence having to still use card decks to share a "personal" computer. ]

One of the more interesting learning exercises I went through in my early days came via the medium of computer chess, thanks to the classic early chess program Sargon II, by Dan and Kathe Spracklen, immortalized in the classic horror film The Thing as the program Kurt Russell is playing against. My father had taught me the basics of the game, but lacking many other people to play (and having discovered the distraction of computing) I didn't play enough to become any good; when a copy of Sargon II turned up I naturally gave it a whirl, and got a nasty surprise.

If you're feeling keen to get the retro vibe, you can try Sargon in an emulator, although I've linked to Sargon III instead of II as it's rather less forbidding to get started with - one small trick if you use the emulator is that the original Apple ][ had no lower-case. You have to enter moves in algebraic notation (using the board positions) as was a common convention for written chess games, and ESC toggles between the graphical board display and the move list.

Now,  there's nothing particularly earth-shaking about Sargon's play, but although when I could get enough time on an Apple ][ to play it on (which I did; I hate to think how annoying and persistent I must have been) I could beat it handily at lower levels, by difficulty level 3 it was consistently beating me - enough that I found it quite frustrating. Now, at that point I could have just played the program to get better at chess, but I didn't - instead I decided that I needed to figure out how it worked, and derive insight into its underlying playstyle that would be a better (and less painful) learning method than being repeatedly beaten.

The other advantage of studying how the program worked was that unlike playing against it, I didn't actually need time with access to the computer to study the program.

Of course, I've since discovered that the original Sargon was published as a book consisting of a commented Z80 source code listing - a common way of distributing programs in the U.S. back in those days, as exemplified by Compute! magazine - but that book never made its way to our shores. However, there were disassemblers for the 6502; indeed the Apple ][ system ROM contained one, and an even better one by Glen Bredon called Sourceror was a companion to the assembly tool Big Mac which I'd started to play with writing ever more complex 6502 assembly language of my own.

So, I loaded the Sargon II binary into memory, pointed Sourceror at it, and got a rough assembly dump, and then with some persuasion got it printed out into a big stack, which I then carried with me everywhere for a month or so at high school, spending every free moment poring over it trying to trace the program's execution and divine its logic and internal data structures (including any parts of the assembly language dump that were actually data instead of code).

While that printout is long gone, I can at least cherish the memory memory of unwinding it all and discovering how it worked; in essence, the key element - and the part of the program I spent most time trying to figure out, since that meant making suppositions about what the data structures it was using were and then tracing through the code to see if those guesses made sense - was a position evaluator that "scored" a given board layout by scanning it to see which moves were available and thus checking which pieces were threatened with capture. The sum of the weights of the remaining pieces, with weight values chosen to reflect their tactical power, adjusted by the sum of the weights of pieces threatened with capture, gave a simple score for the "desirability" of that position.

[ There was an interesting adjustment to the weights of pieces for pawns; their point weighting changed depending on how many rows across the board they had advanced, to reflect the potential value of the pawn if it could reach the end row and be queened. ]

The difficulty level of the program then controlled how deep into the tree of possible positions from the current one the program would recursively search looking for board layouts, using simple Alpha-Beta pruning to try and constrain its search; the one it found which it scored as the best future outcome at the requested depth would result in the program's choice of move. Although my printout with hundreds of scribbled annotations as I unwound the program's structure is long lost, you can get a sense of it by reading the Sargon I source, along with its original commentary.

I definitely own the Spracklens a debt of gratitude; the fact their chess program was better than I am helped make me a developer by making me mad enough to learn how to pull programs apart from the raw bytes. I might not have put enough effort in to learn how otherwise, but I'm certainly glad I did.

Otherwise, for instance, I might not have gone on to study disassemblies of the Apple ][ BASIC floating-point code (helped by "What's Where in the Apple", which I don't think I have kept although I do have some other old bits of Apple][ trivia) and learned floating-point, and come to a fuller understanding of the Taylor and Maclaurin series expansions - around the same time as this I was teaching myself calculus, but recognising the series expansion by figuring out what the tables of constants represented was an epiphany to treasure.

As I said at the start, all good learning takes effort; and although I never did pursue chess seriously, the effort of reverse-engineering that chess program took quite a bit of time and effort that really paid off. Even though it seems like a frivolous exercise and at the time I certainly had no ambition beyond the exercise itself, it had real practical value. Reverse engineering is actually a powerful skill which I put to good use in the next few years (and indeed, serves me well to this day, albeit with better tools to help the process), first writing my own development tools for embedded systems - the SC/MP and Intel 8051, notably - and then using them to write lots of embedded code and system-level stuff.