Saturday, July 9, 2011

The Turbo C Overlay Manager

During the late 80's I was doing most of my work - writing point-of-sale software for the retail oil industry - in a mixture of assembly language and Turbo C, rewriting my earlier point-of-sale system (written in Turbo Pascal on the Sanyo MBC 550 not-PC-compatible) which used a cooperative multitasking system in favour of one with a proper preemptive task switcher.

Even more fun, this little doodad took advantage of the technique of coercing DOS into supporting multitasking by using the INDOS flag and the very under-documented fact that the MS-DOS kernel thoughtfully stored all the key state for the current program in a contiguous block of memory. With the address and size of this block, as long as the MS-DOS kernel wasn't busy doing I/O at the time, task switches could be done by not just saving the CPU state and switching stacks as per normal, but also saving and restoring the MS-DOS state in the "swappable data area".

With two monitors, the resulting point-of-sale system would run as a TSR, typically driving a monochrome adapter running in the background, running several serial ports (with its own keyboard) and printers while the rest of the machine was available to run a back-office program on the colour display adaptor. The next elaboration was to use the EGA adapter, but to split each colour plane out and use it to drive a separate green screen, since the task system would happily run several instances of the main point-of-sale task so we could run up to three point-of-sale registers from the same PC, which could still run the accounting software on the other display adapter.

While this worked well, as features were added to the point of sale the memory pressure in real mode started to grow, so I looked into one of the fancy features of Turbo C, what Borland called the VROOOM overlay manager.

Overlays were a crude technique in the early days; typically, functions in a Pascal program would be designated as "overlay" routines, in which case they were split out from the main executable into a separate piece managed by library code in the main routine. Before calling an overlay routine, the caller would ensure that the right overlay was loaded in first so that the code for the routine was resident in memory.

All well and good, but with one drawback; since the memory used to load each of the routines in the "overlay" sections was overwritten as different code was loaded in, if an overlaid routine called a non-overlaid routine, which then called a different overlay, things didn't work out so well.

VROOM was more clever than that, though. What some clever folk at Borland had done was a neat trick to allow routines in overlays to work safely. I believe the same fundamental idea was also picked up by real-mode Windows to manage code sections, since real-mode Windows was under really extreme memory pressure, pretty much all application code needed to be swappable which resulted in similar problems.

With some persistence and reverse-engineering, it turned out that the overlay support was mediated by the linker which glued together the final program; the linker then emitted a small thunk for each entry point in any object module which had been tagged on the linker command line for overlaying, and this thunk took care of the transparent code loading of the right overlay segment on function entry.

However, it cooperated with any code higher up in the call stack cleverly; since Borland's compiler used a consistent frame format, the overlay library walked up through the stack looking at code segment addresses which matched the overlay area. If it found one of these call records on the stack it would rewrite it, extracting the original return address and replacing it. Finally, it would then task that stack frame as one it had modified, by adjusting the saved frame pointer to set the low bit (which for normal frames would always be 0, as the 8086 stack is always 16-bit aligned).

This meant that when the flow of control eventually tried to return to some code in an evicted overlay, the rewritten return address would be called instead, which went into a thunk in the overlay manager which would reload the correct overlay segment and fix the stack back up so it was as it was originally before continuing execution.

This made the overlay process pretty transparent, and reasonably reliable, although since it worked at a linker level it would sometimes take some care ensuring that the right routines were bundled together so the call hotspots didn't try thrashing overlay loads.

One remaining problem with it though, was that it still needed a lot of space for the overlay region itself (as much as the biggest segment it might load, which in practice for a program big enough to need overlaying, typically meant 64kb). That was a particular problem for our point-of-sale TSR.

So, having reverse-engineered this I decided to get creative and write my own overlay manager, which took things to the next level by putting the overlay in an unusual place: the Expanded Memory System page frame. My replacement overlay manager preloaded all the overlays into Expanded Memory areas, and then handled the overlay loads by remapping the EMS page frame to the right one for both zippy performance and no additional burden on MS-DOS real memory.

And since I had a preemptive multitasker which was already managing the DOS state switching, I then expanded that to track the EMS page frame state as well, so that my application which was running as a TSR could run concurrently with the main program even if the main program was using EMS as well. Hey presto, we could then run our point-of-sale in the background with something like 96kb of real memory load; 32k of core C library and overlay manager plus a 64kb data segment for the point-of-sale task instance.

And since one of the things I wanted to do around that time was write my own UNIX kernel for a 68k board to replace all this DOS malarkey, when I was working on that I could use this technique to help get GCC running in real-mode MS-DOS, although even with the overlay manager running out of the page frame, GCC still wouldn't fit. Fortunately, that too turned out to be a solvable problem....

No comments:

Post a Comment