• Please review our updated Terms and Rules here

Possible to do texture mapped pseudo 3D rendering on a 8088 / CGA machine?

I suppose I could have a 'anti snow' option where I render to a buffer in conventional memory and then copy to VRAM during the VBLANK.

Yes, but be prepared for the speed hit: On a 4.77 MHz PC with CGA, you can copy 200 words, one per hblank, and about 480 words from the end of the visible area to the end of the vblank, for a copy rate of about 1360 bytes per second. So it takes 3/60th of a second to copy an 80x25 text screen completely without snow.

One crazy idea is that the copy to VRAM during the VBLANK could happen on an interrupt whilst rendering out the next frame to another buffer. The interrupt would have to poll the state of the CGA registers though so it wouldn't be ideal. It would be better than just idly waiting for the VBLANK though.

The only way that could work is to set up a software interrupt timed to the end of the display area (PIT divisor is 19912 for stock CGA, if you're curious) and do only the 480 words, because copying hblank words without snow requires nearly completely CPU dedication. Also, you'd have to copy to a hidden page, then flip to it once the screen had been updated, for a tear-free implementation. The 6845 is latched so thankfully that's easy (just make the switch when done and it will update on the next vertical refresh). And I think I mentioned this already but it bears repeating: Snow can occur even if you're writing to a non-visible area of RAM.

PS how long do I have to wait before my account can post without having to first be approved by a moderator?

About 10 posts :)
 
@deathshadow You are correct, the roll effect just shifts the columns up and down slightly to produce the effect. It definitely helps it feel more '3D'. I have been thinking about how I can do this in my PC port since the pre-compiled scaler routines have a hard coded 'horizon' and also have to pack two vertical pixels together so I might need to make an additional set of scalers to handle it.

I tried Paku Paku on both the PC 3000 and the HP 100LX and both have issues with the custom screen mode. It looks like the register for text character height is just ignored in the CGA implementation. Neither of them have the snow problem of the original CGA card fortunately! Part of me wonders whether some old apps intentionally run slower on these devices because they detect a CGA card. I only noticed the snow in my demo when I was testing in Dosbox-X which emulates CGA snow.

@Trixter surely 1360 bytes per frame, not 1360 bytes per second? If I only copy the 80x20 windowed area, and only attribute bytes then its only 1600 bytes to copy but I think it is still going to be a struggle to keep it running at a reasonable rate! I might just ignore the snow issue for now. (especially since I don't have actual hardware to test any anti-snow methods and DOSBOX isn't exactly accurate for these types of things)
 
Sorry! Yes, I did mean to write "per frame". At the rates posted, you could update an 80x25 text screen without snow no faster than ~20fps, but this steals time away from other calcs of course.
 
Yeah, I'd consider slow framerates to be way more of an issue than a little snow...especially for those of us who wouldn't even be playing on a real CGA ;)

Which reminds me, I'd be happy to test this thing out on my Tandy. Handles PakuPaku just fine.
 
Yeah, I'd consider slow framerates to be way more of an issue than a little snow...especially for those of us who wouldn't even be playing on a real CGA ;)

Which reminds me, I'd be happy to test this thing out on my Tandy. Handles PakuPaku just fine.

Even those using 8088, are likely using EGA or VGA card.
Even if we're using a 'real' CGA card, it's likely an aftermarket card, which does not suffer from snow.
Even if our card does suffer from snow, I for one would take snow if it meant playable FPS.

Best case, of course, would be you have two algorithms, one which causes snow but is faster, and one that doesn't cause snow but is slower. You'd use the latter perhaps on 'Turbo XTs' or 286es what still have CGA susceptible to snow.
Of course, this may very well require radical or difficult changes, so sticking with snow but fast is likely the best option.

just my 2 bits.
 
Even those using 8088, are likely using EGA or VGA card.

I wouldn't say that's likely at all. A lot of IBM PC owners try to keep everything stock, and that includes original CGA.

Clones go two ways: Many without snow (IBM PCjr, Tandy 1000) but some with snow (Olivetti M24, others).

Even if we're using a 'real' CGA card, it's likely an aftermarket card, which does not suffer from snow.

I own two aftermarket CGA cards that do suffer from snow.

Best case, of course, would be you have two algorithms, one which causes snow but is faster, and one that doesn't cause snow but is slower. You'd use the latter perhaps on 'Turbo XTs' or 286es what still have CGA susceptible to snow.
Of course, this may very well require radical or difficult changes, so sticking with snow but fast is likely the best option.

Another option: Run the game in graphics mode, which doesn't suffer from snow. If you target composite color output, writing entire byte columns will give you an effective 80x200 in 16 colors, although the tradeoffs are 1. you can't test it effectively on your little handheld systems, 2. speed, and 3. no hidden pages, it's one giant page.

I still think the best compromise is 40-column text mode. Even if you reduce the character cell height to get 50 rows, you still have enough VRAM for two pages, and there's no fear about snow on any system.
 
Another option: Run the game in graphics mode, which doesn't suffer from snow. If you target composite color output, writing entire byte columns will give you an effective 80x200 in 16 colors, although the tradeoffs are 1. you can't test it effectively on your little handheld systems, 2. speed, and 3. no hidden pages, it's one giant page.
This was how I was originally going to approach it: composite mode for real CGA (with composite out) and 4 colour greys for the handhelds. The only difference being I was originally intending to target 160x200 resolution instead of 80x200. The killer I think is that with only enough VRAM for a single framebuffer it means having to store the framebuffer in RAM (for sprite compositing) and copy to VRAM. Also the higher vertical resolution will mean more memory to copy.

I still think the best compromise is 40-column text mode. Even if you reduce the character cell height to get 50 rows, you still have enough VRAM for two pages, and there's no fear about snow on any system.
I did a quick test with my own code (instead of just running Paku Paku) and found that the HP 100LX just ignores the CRTC registers for text character height. It looks like for the Sharp PC-3000 it actually does work (at least with 40 column mode) even though Paku Paku looked all messed up. Potentially I could make two different renderers: one for the 80x25 standard text mode and a 40x50 tweaked mode to avoid snow. The 40 column mode would still be slower due to the way I'd have to pack the columns.
 
I might just ignore the snow issue for now. (especially since I don't have actual hardware to test any anti-snow methods and DOSBOX isn't exactly accurate for these types of things)

DOSBox doesn't emulate CGA snow, but PCem/86Box have that option, and I recently tested it against an XT with a CGA card - both IBM originals. Somehow the PCem/86box implementations were *surprisingly* accurate at the emulated 4.77MHz - there was a slight difference in timing, but IIRC it amounted to something like half a scanline out of the vblank's 62. In any case, the "480 words per vertical blank" rule is pretty well-tested and accurate, so relying on that (even blindly) may be enough.

However I'd agree that 40x50 text would be best for snow avoidance, perhaps along with the current 80x25 one for snowless setups. Readable status text at 40x50 is an extra complication - not impossible (see Magiduck), but I'm the sort of person who actually enjoys bizarre graphical challenges like that, so YMMV. :)

In any case, this looks pretty awesome (and +1 about that 'roll' effect). Keep it up!
 
I was using Dosbox-X which has a CGA snow emulation setting. It appears to be quite similar to PCem but there is still just the arbitrary 'cycles' setting for CPU speed so you get more or less snow based on that.

With PCem configured as an IBM XT with CGA card my demo runs at ~10FPS (with lots of snow) which would be just about playable. The 40x50 mode might add too much overhead, at least for the 4.77Mhz machines.

Out of interest I tried Magiduck on my palmtops to see how the 40x50 mode behaved. On the HP 100LX it showed two screens side by side and didn't do the half height characters. On the Sharp PC-3000 it looked promising at first with properly drawn half height characters but every alternate frame is drawn wrong. It looks like when the active text page is changed it is pointing to the wrong bit of memory.
 
Out of interest I tried Magiduck on my palmtops to see how the 40x50 mode behaved. On the HP 100LX it showed two screens side by side and didn't do the half height characters. On the Sharp PC-3000 it looked promising at first with properly drawn half height characters but every alternate frame is drawn wrong. It looks like when the active text page is changed it is pointing to the wrong bit of memory.
A little disappointing but not exactly surprising. Those palmtops don't use a true MC6845, and were mid-'90s products, and at that point in time it would've been wasted effort to bother with 100% register-level CGA compatibility.

They still have pretty interesting video setups in their own rights, at least the HP does. And a straight 80x25 text mode is probably as portable as you can get.
 
A little disappointing but not exactly surprising. Those palmtops don't use a true MC6845, and were mid-'90s products, and at that point in time it would've been wasted effort to bother with 100% register-level CGA compatibility.

Of course that sort of thing wasn't new in the 90's. Trying to run anything that colors at all outside the lines on an IBM Convertible's built-in LCD results in all kinds of sadness.
 
I guess I was a bit spoiled by the Amstrad PC1512. Despite not having a real CRTC, it did allow changing the number of scanlines per row in both text and graphics modes. So tricks like the 160x100x16 mode worked just fine. The red/cyan/white palettes worked the same as on a real CGA too. I only came across one piece of software (Astro Dodge) which wasn't usable at all due to CRTC reprogramming (setting a 256x200x4 mode). A few other games had horizontal centring controls which didn't do anything.
 
I started porting the rest of the Catacombs of the Damned code to work in DOS. I've attached a work-in-progress demo for anyone interested. It procedurally generates the level on startup so takes a few seconds to begin. Currently it is just rendering world geometry, no sprites. It runs at about 15-20 FPS on my palmtops depending on where you are in the level.

The level is divided into rooms connected by portals for determining visibility. You can hold down CTRL to see the portals rendered in black.

Video capture from DosBox-X:
 

Attachments

  • COTD.zip
    22.2 KB · Views: 1
Tested on real hardware (5160 + CGA) and averaged 7-9 fps, totally playable, very impressive. Snow not too big a deal since it's not 100% constant, but only when the frame updates.

I'm still not clear on how the basic wall calc and rendering engine works, so if there is some write-up you can point me to, I'd love to read it. You established that it wasn't raycasting, but interpolation was involved. So, is it 3-D with only X/Y movement and rotation on one axis?
 
Here is a bit of an explanation for how the demo works.

The level structure is generated into a 2D grid structure by recursively splitting the available area into small rooms. It is similar to the method described in this article. There is a random chance that walls between neighbouring rooms will be removed and large rooms may be cut out in the centre to create a ring shape.

For each room, a set of walls are calculated which is where the edge between an empty cell and a wall cell exist. For each edge that connects two neighbouring rooms, a 'portal' is created. The vertex end points for each of the walls / portals are stored in a list, and vertices can be shared between connected walls.

This is how the visibility algorithm works:
The renderer has a queue of rooms to draw. It first finds which room the camera is in and adds it to the queue. It can do this quickly because the 2D grid structure stores which room each cell is associated with, so this can be deduced based on the cell that the camera is in.
For each room in the queue:
- Transform all the vertices into view space and render the walls (more details below on how this works)
- For each visible portal:
- if the connected room has not yet been rendered then add to the queue

Some extra data in the render queue stores the left and right clipping regions. Imagine a portal being the doorway into the next room, it will skip rendering that are not visible outside of the region of the doorway. There is also an optimisation that will limit how many portals deep the renderer will go.

For the actual wall rendering:
Walls are just lines which are stored as two 2D points (vertices).
- Each vertex is transformed from world space (x, y) to camera space (x, z), by translating relative to the camera and rotating by the camera's yaw.
- Each vertex in camera space can be thought of being in the space (x, z) where z is the depth from the camera.
- If both vertices are behind the front clipping plane then it is behind the camera and can be discarded.
- If one vertex is behind the clipping plane then it is projected on to the clipping plane, so that the clipping plane is 'cutting' the line.
- The x coordinate has perspective projection applied: x = K * x / z, where K is some constant
- The x coordinate needs to be transformed to screen (pixel) space: x = x + half_display_width
- For each vertex a w value is generated: w = wall_height * K / z, where K is the constant
- The w value is the scale of the wall in pixels at each end of the wall
- The w value between each end of the wall can be calculated by doing linear interpolation. I use something a bit like Bresenham's line algorithm to do this.

A few extra details:
- Walls are back face culled. This is simple: if the second vertex's screen space x value is to the left of the first vertex's screen space x value then you are looking at the back face of the wall.
- The wall rendering doesn't draw directly into the frame buffer. A 1D array of w values is stored first (wBuffer). It is the size of the display width. The closest (i.e. largest) w value is kept in the case of overlapping walls.
- For each value in the wBuffer, a precompiled scaler routine is called based on the scale of the wall. This is an optimised unrolled loop that draws to the frame buffer.

Hope this all makes sense!
 
With this approach, walls aren't required to be at 90 degrees. The catacombs could be at all kinds of crazy angles. Looking forward to more!
 
Last edited:
With this approach, walls aren't required to be at 90 degrees. The catacombs could be at all kinds of crazy angles. Looking forward to more!

Yes technically the geometry could have any shape and doesn't need to be limited to the grid. I'm doing it this way because I already have a bunch of systems already relying on a grid structure: level generation, collision detection, enemy AI.
 
I've noticed a problem when using text mode on the different palmtop devices. They treat text colour attributes differently, so although they both have different shades of grey in their display, there is no consistent way to map them between devices. I also noticed that the Sharp in particular appears to change the level of grey depending on the background / foreground colour combination (probably to help increase contrast). Unfortunately this means there is no guaranteed way to map the graphics in text mode, even if just targeting black on white.

For this reason I had another look at graphics mode. There are a few pros and cons that I'm weighing up:

Text mode:
+ Full 16 colours
+ Enough memory to page flip so can write directly to VRAM without artefacts
- Snow in 80 column mode on true CGA card
- Problems rendering on monochrome LCD based displays

Graphics mode:
+ Consistent rendering across devices
+ Can have high resolution status bar / window dressing
- Limited colour palette (although can use composite colour on true CGA card)
- Frame buffer needs to be in main memory and requires copy to VRAM

Here is a comparison of how the different modes look:

Text mode:
text.png
RGB graphics mode:
rgb.png
Composite graphics mode:
composite.png

For a performance comparison of the scene above (and this is DOSBox so only a rough guide, although performs similar to my HP 100LX palmtop):
Text mode: 18fps
Graphics mode: 13fps
So graphics mode takes quite a hit in performance due to the frame buffer copy from main memory to VRAM. It's still a playable frame rate on my palmtops but a 4.77MHz 8088 would likely struggle a bit. I could maintain both modes but it is a bit of a pain so I'm leaning more towards the graphics mode.

I also experimented with an option that copies only even / odd lines every alternate frame which brings performance back up to 15fps but introduces a ghosting effect as you move around which is quite ugly.

Any CGA graphics mode tricks I should be aware of that could improve performance?
 
Back
Top