• Please review our updated Terms and Rules here

Possible to do texture mapped pseudo 3D rendering on a 8088 / CGA machine?

jhhoward

Member
Joined
Sep 1, 2019
Messages
39
Over the years I've developed several pseudo 3D renderers similar to Wolfenstein 3D for various bits of hardware. Most recently Catacombs Of The Damned! for the Arduboy which runs on an Atmega chip with only 2.5KB of RAM. I also developed a demo for the Uzebox, another microcontroller based console which works without a framebuffer and generates a composite signal on the fly in software.

I recently acquired a Sharp PC-3000 which is an XT compatible DOS based palmtop computer (although higher clocked) and was considering the possibility of writing a 3D renderer for it. I was hoping to pick the brains of those who are familiar with the XT class hardware!

For a high level overview of what I was considering:
  • The 3D effect will be built using vertical 'slices' similar to Wolf3D. Could use raycasting, BSP or portals to build a 1-D array which stores for each screen column, the scale of the wall and which slice of texture to use.
  • Each slice can then be rendered by using a 'coded scaler', similar to Wolf3D (it is described in detail in Fabien Sanglard's Wolfenstein Black Book) Essentially, at load time, you generate a set of specialised functions for rendering wall slices at different sizes in an unrolled loop, so that you can avoid calculating the scaling at run time.
  • Sprites are drawn in a similar way to walls by rendering slices with scaler functions, but with some changes to handle transparency.

It certainly looks like it could be too much for such a limited system, so here are some of my thoughts of corner cutting:
  • The first easy win: render the 3D portion in a smaller window at e.g. 256x128 (which works out to about 50% of the framebuffer pixels) with the rest of the screen having a nice border, status bar etc
  • The renderer could be set up to support both 320x200 4 colour and 160x200 composite colour modes. In the 4 colour RGB mode, use vertical stripe / dither patterns to fake extra colours. Effectively render at 160x200 in both modes. With the reduced size window this is actually 128x128 logical size.
  • Rendering at this resolution means that instead of 2 bits per pixel, it is effectively like rendering at 4 bits per pixel / one nybble per pixel. This still complicates all the rendering functions somewhat: to write a pixel, you have to first read the relevant byte, mask it, shift your value, OR it, write it back. Since the CGA card only has enough VRAM to store one framebuffer, for a renderer like this you would need to store a buffer in main RAM and then copy the contents to VRAM once you have finished. It then struck me: the buffer in main RAM could be stored at 1 byte per pixel (which works out to 16K with the 256x128 window), and the routine that blits to VRAM could be a big unrolled loop which squashes it back down into the 4 bits per pixel format as it writes to VRAM. With some special care, the scaling routines could output either the high or low nybble depending on if it is an even or odd column, so the inner loop of the blit routine could be as simple as 2 word reads, 2 OR instructions, and one word write.
  • If the texture maps are simplified, e.g. all walls have a very basic brick pattern, then the texture details themselves can be coded right into the wall scaler functions so texture details don't have to be loaded from RAM.

Does anyone with experience with the real hardware think that this would all be possible to do at an interactive framerate? Any obvious shortfalls or better ways of approaching the above?
 
I would look at the 'hacked' text modes that give you 16 color lo-res graphics. Lowering the resolution is going to improve your frame rate by simply reducing the amount of pixels you have to render. You can get 160x100 in 16 colors, but to run well on a stock IBM-PC, you might even consider 80x50. You wouldn't need to worry about screen snow at the 80x50 resolution, get double buffering and you can get acceptable quality at a high enough frame rate to be fun.

Here is my 2.5 D raycaster on a 1 MHz Apple II at 40x48 resolution in 16 colors at around 20-25 fps: https://www.youtube.com/watch?v=QUN5CSWiLaw&t=3s

I think you could pull this off at a slightly higher resolution on a 4.77 MHz 8088.
 
Certainly possible, especially if you're willing to compromise on resolution. resman's idea about using text mode is a good one; in particular, I'd go for the 80x50 approach, since that means each color/attribute byte corresponds to two pixels in the same column, so there's no need to worry about masking/shifting/etc. and the whole thing becomes much simpler, especially since framebuffer access would no longer need to be a read-modify-write process.
 
Last edited:
Come think, if you wanted to get super fancy, you could use one of the hacked text modes to get 8x4 or 8x2 character cells (80x50 or 80x100) and use the halftone characters in code page 437 to blend between two colors for a total of...um, 376 "colors" :lol:
 
Thanks for the feedback! @resman your raycaster is really impressive!

I wrote a test which did the most basic wall scaler (just solid colour) and the combine+blit to screen routine that I described. On my Sharp PC3K (clocked at 10MHz) it takes ~46ms for the scaler to fill the screen and ~86ms to do the blit. This only reaches ~7FPS which isn't very promising since it isn't even doing the actual wall calculations yet!

I initially considered text mode sort of 'cheating' but after some consideration it might be the best option. Having enough VRAM available to do page flipping and writing directly to VRAM would be the biggest win I think.
 
If only there were a full-screen CGA mode with linear memory addressing, 256 colours (one byte per pixel), and two pages so you could display one while rendering the other. Even though the resolution would have to be very low (like 80x100) in order to fit two pages into 16kB, it would be ideal for this sort of thing!

That reminds me - there's a project that I started and really should finish...
 
I put together a simple tech demo to see what a renderer using CGA text mode would look like. I get ~20FPS on my Sharp PC 3000 and HP 100LX although I will probably need to change the colours to work better on the LCD screen. There is nothing to stop CGA snow which might be a problem on a real CGA card.

Here is a video clip from DosBox-X
 
I put together a simple tech demo to see what a renderer using CGA text mode would look like. I get ~20FPS on my Sharp PC 3000 and HP 100LX although I will probably need to change the colours to work better on the LCD screen. There is nothing to stop CGA snow which might be a problem on a real CGA card.

Here is a video clip from DosBox-X

That looks pretty good! At that resolution you could use the 40-column text mode instead of the 80-column one, the vertical half character instead of the horizontal half, and reprogram the CRTC to give you 50 rows instead of 25. That would solve the snow problem, though it would be more difficult to write the HEALTH/MANA/SCORE labels.
 
That looks pretty good! At that resolution you could use the 40-column text mode instead of the 80-column one, the vertical half character instead of the horizontal half, and reprogram the CRTC to give you 50 rows instead of 25. That would solve the snow problem, though it would be more difficult to write the HEALTH/MANA/SCORE labels.

Does CGA snow only happen when in 80 column mode? I thought it was a general text mode issue. The only problem with using vertical half characters is it would complicate the rendering functions as everything is rendered in vertical spans.
 
Though it would be more difficult to write the HEALTH/MANA/SCORE labels.

Which is why I'd consider ditching it even having scorekeeping, and tracking health/mana as simple bars like all modern games do. Give you more screen for some type of "other" data too like say consumables? Keys? That sort of stuff.

I've been considering my own projection routine (which is NOT raycasting) for something along these lines. I've still got to play more with the idea. I have a VERY different approach that may or may not be faster than raycasting, and it's related to how I handle 360 degree projections where, well... I use atan2.
 
I've been considering my own projection routine (which is NOT raycasting) for something along these lines. I've still got to play more with the idea. I have a VERY different approach that may or may not be faster than raycasting, and it's related to how I handle 360 degree projections where, well... I use atan2.
Ooh, dish! :D
 
Very cool demo! The snow is only an issue in 80 column mode, so you have a decision to make: a little more complexity to use the half-characters in 40 column mode, snow check in 80 columns, or just let the snow happen. Personally I would go with the 40 column mode with the half character option. You can draw the first column, zeroing out the second column, then only needing to do an OR when drawing the second column to merge with the first column. Slightly slower, but waiting for horizontal retrace in 80 column mode to avoid snow would kill performance.
 
It's faster to build both columns in the register, then just lay down the byte and attribute as a word. And yes, even though that's overhead, it's still much faster than waiting for snow. The 256-color plasma in 8088 MPH was 60fps until we had to add snow avoidance, which brought the framerate under 20fps.

Fun fact: There's snow in 80-column text mode even if you're reading/writing to a hidden text page.
 
It's faster to build both columns in the register, then just lay down the byte and attribute as a word
That might be fine with blitting and pre-calculated data, but I don't think that's viable with a raycaster where you're basically doing a Bresenham algo on the texture for each column. (though if the memory is available, I'd try to pre-render that) You'd have to run both columns texture scaling at the same time if you wanted to keep the write value to a single register -- which would result in more memory access since there just aren't enough registers to run TWO texture scalers at the same time.

A two-column (one byte width, height length) buffer might be the better option since conventional memory reads/writes faster. Write the first column to the buffer flat (which would be fast), then have the alternating columns read it, AND the new column data, then write it to video memory. Then you only need byte-width operations in the output as well, which if the target is 8088 avoiding the POINTLESS character write could be well worth it.

Would give you really fast for even columns, and slower but acceptable for odd columns. You might even consider creating your casts in different arrays and just using a conventional memory buffer for the whole viewport containing just the odd columns before sprites, EVEN if you have multiple pages to work with. Just as you'd likely have a x-index z-buffer for figuring out when to -- and when not to -- draw sprites.
 
Traditional wolf3d-style raycasters use pre-scaled wall segments. If the OP maintains pre-scaled segments in two formats (upper nybble, lower nybble), the byte could be built in the register, then written in one operation.
 
Honestly, I have to wonder if it wouldn't be pretty straightforward to calculate/select the scaling for two columns at a time, and then build them in the register simultaneously all down the column. The 8086 has room enough to do it without having to juggle, I'd think.

Still do like the property of 80-column 80x50 mode where it's one column per byte, though.
 
I'm not completely attached to the old school status bar. Catacombs of the Damned on Arduboy uses a simple bar to show health / mana. This is what is looks like on the Arduboy at 128x64 resolution:
demo.gif


The code actually doesn't use ray casting, it projects the two ends of the wall and linearly interpolates. You can see the code for the Arduboy version here if you are interested.
In that version, instead of using image based textures, it has a vector pattern for the wall texture which helps prevent aliasing that you get when texture sampling a far wall.

I did some experimentation for the CGA version: instead of the vector based approach I tried image based sampling with mipmapping which worked ok but didn't look great with such a low resolution. The current solution in the video I shared works by having a set of precalculated routines for each size of wall slice, as @Trixter described. Because of the simplicity of the wall texture, it actually works out that each routine is simply 20 MOV instructions in an unrolled loop, which I don't believe could be any faster!

The problem I have with using the 40 column mode (or any 'hacked' CGA mode) is that they don't work properly on my Sharp PC 3000 or HP 100LX which are the only real hardware I have to run on. I'd rather support those machines since they are the ones that I can actually test on real hardware. The snow is a real pain since I'm not even writing directly to the active text page!

I suppose I could have a 'anti snow' option where I render to a buffer in conventional memory and then copy to VRAM during the VBLANK. This will likely slow things down a whole lot though. One crazy idea is that the copy to VRAM during the VBLANK could happen on an interrupt whilst rendering out the next frame to another buffer. The interrupt would have to poll the state of the CGA registers though so it wouldn't be ideal. It would be better than just idly waiting for the VBLANK though.

PS how long do I have to wait before my account can post without having to first be approved by a moderator?
 
The code actually doesn't use ray casting, it projects the two ends of the wall and linearly interpolates.
So very close to how I was going to approach something similar.

I really like the "roll" effect, which likely wasn't that hard to do (simple change of the origin on each column). Makes it more impressive IMHO than a lot of higher resolution implementations.

it actually works out that each routine is simply 20 MOV instructions in an unrolled loop, which I don't believe could be any faster!
Probably could be tweaked a bit more if "don't write what you don't have to" were in place, but that often involves frame to frame tracking.

The problem I have with using the 40 column mode (or any 'hacked' CGA mode) is that they don't work properly on my Sharp PC 3000 or HP 100LX which are the only real hardware I have to run on. I'd rather support those machines since they are the ones that I can actually test on real hardware. The snow is a real pain since I'm not even writing directly to the active text page!
Really? Are you able to run the 80 column ones like Paku Paku, or do they fail? If they fail, what's the video implementation because my Sharp PC 7000 handles it just fine, and it's usually far more finicky. (being a much older machine). What does Magiduck do on the machine as it's one of the better implementations of the 40 column hack.

I'm actually surprised the PC 3000 would even have snow, since by that point most PC competitors had done away with that flaw.

I suppose I could have a 'anti snow' option where I render to a buffer in conventional memory and then copy to VRAM during the VBLANK.
I've played with and tried a lot of things even at the machine language level, and I've never been able to implement anti-snow in the 80 column mode that was "fast enough" for my tastes. You end up spending so much time manually watching for the retrace you end up not having any time to actually run game logic. It's great for the "demoscene" but the processing power just isn't there to do anything "real' with it once user input and unpredictable rendering is involved unless you drop down to stuff as moronically simple as "snake".

See why the frame rate in "round 42" is pathetically laughable and borderline unplayable on PC/XT until you get at least 2/3rds the bad guys killed, and it's using monochromatic sprites! Still remember the first time I saw it I went "is this a joke?". People talk about Round 42 a LOT, but I was always more impressed with "moon bugs" and even that was a jerky tearing mess with an unstable frame rate.
 
Back
Top