sqpat
Experienced Member
I've been working on a project the past year porting DOOM (a 32 bit x86 program) code to real mode x86-16 code. The architecture is about where it needs to be now, and I'm now digging into rendering code segments where most of the CPU time is spent and I need all the performance I can get. There are long runs of pixels drawn at a time where texture u/v cordinates are calculated per pixel, and I've found having BP and SP available to hold some numbers around essentially lets me get away with 3 memory accesses per pixel instead of 5, which adds up (320 x 200 resolution means as many as 64000 pixels at multiple frames per second). Of course I can only do this if i wrap the relevant code with cli/sti, and i've mathed that out to about 5000 cycles (240 memory accesses and some ADDs and ANDs) on an 8088 for runs of 80 pixels in worst case scenarios. Honestly it may be worse if the prefetch queue struggles, but there are no instructions greater than 2 bytes.
I'm travelling and away from my old hardware at the moment, so I can't test for sure, but emulators like 86box seem to not have issues. Does anyone have any experience with anything like this? I haven't implemented sound and I assume that could cause trouble eventually since I think sound hardware uses frequent interrupts.
I'm travelling and away from my old hardware at the moment, so I can't test for sure, but emulators like 86box seem to not have issues. Does anyone have any experience with anything like this? I haven't implemented sound and I assume that could cause trouble eventually since I think sound hardware uses frequent interrupts.