• Please review our updated Terms and Rules here

RealDOOM: A port of DOOM to Real Mode

sqpat

Experienced Member
Joined
Mar 21, 2009
Messages
168
Location
Seattle, WA
RealDOOM is a port of vanilla Doom (forked from PCDoomv2, to be specific) made to run in Real Mode. (Coincidentally, another project named Doom8088 was being worked on at the same time, with a similar goal but a different starting point.)

current16.png
(rendered from a 16 bit program! photo representative of actual FPS!)

You obviously cant just take the original source and build it to 16 bit. Development was mostly done by continuing to build for 32-bit but with more and more 16-bit style restrictions and code. A few weeks back I was able to build the binary for 16-bit, but it crashed in initialization. Fast forward to this past week, and16-bit binary was stable enough to make it out of initialization and run for a bit in text mode. Over the last couple days I fixed several bugs in the video code, and now the game kind of runs in DOSBOX. Bugfixing continues, though. The main task involved moving the program's heap to run off of EMS.

I have been working on this for over 3 months, and now I decided to finally announce the project since it "sort-of" running properly in 16-bit mode. Timedemos (playback recordings) diverge after a couple hundred frames, so there's definitely still bugs and this is still what I would call an 'alpha' release. Honestly I wanted a bit more polish on this before I posted about it, but I'm going to be traveling for a month very soon wanted to post this now as I might not be able to put as much time into it until I'm back.

Below is mostly technical details especially for those knowledgeable about or interested in the DOOM engine.


Technical Details - What was done
As a note - going into this project I'd never written anything for x86 assembly (and still haven't) and had never written 16 bit DOS software. It turned out that this has entirely been a C project so far.

DOOM allocates a huge block of memory at startup then runs its own allocation scheme within the big block of memory in z_zone.c This was rewritten to use EMS and return "MEMREFs" instead of pointers. MEMREFs can be used to get a pointer from the memory manager, which internally keeps track of which pages every variable is in. As references to anything in the EMS heap go stale quickly, MEMREFs get stored and passed around instead of pointers. There is a lot of juggling going on to support 4 page frames and 64k of active EMS memory. If you build the project for 32-bit, there is still a big allocation done at startup and the EMS support is emulated and you can "cheat" and have dozens of page frames.

Aside from that - lots of changing of variables from 32 bit to 8 or 16 bit when possible... lots of removal of 'minor' features and dead code... really, too much to remember. FastDOOM's early commit history was helpful.

RealDOOM's Goal
The goal is accuracy first. RealDOOM is meant to be able to run with the same graphical fidelity and engine accuracy as the original DOOM. It should support the WADs released by id software. There are some limits to things like texture sizes, map sizes, things like that - so not every custom WAD out there will work - but it should support the original id software content. Timedemos should also play back accurately.

About Speed and Performance
It's very slow now. Right now, dosbox running the 16-bit executable on my Ryzen 3950x probably gets 0.3-0.5 FPS or so with full detail. At some point when the engine is as stable in 16-bit as it is in 32-bit, I'll probably go ham with optimizations. I think 20x or so speed improvements are pretty straightforward - there's a lot of potentially easy stuff to do (see below). I doubt this will be running playable speeds on a 286 with the same level of detail as the original game, but we can always strive for that. Things like lower resolution textures and potato level detail and such can be considered, or optimizations based around fixed smaller resolutions.

Performance issues for this port have less to do with how many pixels there are to draw and a lot more to do with memory swapping. It's really not just a matter of reducing the view size and drawing lower quality. Most of the speed improvements will come down to making fewer page switches. In the worst case, hundreds or thousands of EMS page switches are being done per frame. I recently added a conventional memory allocator using whatever conventional memory is left at initialization for certain heavily used variables (lines, nodes, etc) , which helps a ton. As the binary size keeps decreasing and more memory is made available, more and more fits in conventional and doesn't have to be paged in and out. There's 20k or so unused bytes in the default data segment too, which could be used for something - I just haven't done it yet. If page frames are increased from 4 to 8 or 10 (Using c800-EFFF) then things will run even faster. Meanwhile, there's no assembly in there yet either. I'm noticing that the 32 bit and 16 bit versions respond differently to different optimizations, so as the 16-bit version becomes more stable there will be a lot more to try.

A note: on real hardware with a 286, hardware EMS support will be pretty important. (Imagine a poor 286 trying to copy 16KB back and forth thousands of times per frame!) I have a 286 that I've run over 35 mhz before that I would like to run this on at some point... but that's a story for another time.

Project Status

Known (Major) Issues

- Savegames dont work
- No sound (need to find a 16 bit compatible library or write from scratch?)
- Multiplayer support removed
- Joystick support removed
- Untested outside of doom1 shareware for now.
- 16 bit mode is a bit of a bugfest still

Planned Memory Optimizations
- EMS 4.0 style support for more page frames
- Use of the default data segment - there is over 20k free in there mostly unused. Maybe map objects can all be put in there, or blocklists, or something.
- Continued optimization of binary size + better use of conventional memory to reduce swaps

Planned Assembly Optimizations
- Rewriting FixedMul and FixedDiv in assembly. We are doing thousands of 32 and 64 bit integer multiplication/division calls per gametic/frame... I'm sure this is a huge performance drain. If they can be made any faster in ASM it should help a lot
- R_DrawColumn, etc in assembly
- Other math functions in assembly (eg multiplying by constants, multiplying 16 bit by 32 bit, multiplying 16 bit by 16 bit into 32 bit, etc)

Planned Feature Optimizations
- At some point I'd like to add some flags for no textures and flats and/or potato quality
- Hybrid EMS/conventional visplanes solution. I have each working independently, but EMS visplanes are notably slow. Maybe 48 or so can be in conventional, and then anything beyond that in EMS, so that the rest of that conventional memory can be used for better purposes.


Well, that's the current state of things.. as I noted a little earlier, there are still stability issues but its almost there.

I also want to shout out Viti95, who wrote FastDOOM and contributed a couple of optimizations to RealDOOM as well. If anyone else wants to contribute in some way I'd very much appreciate it. Especially anyone with 8088/286 assembly experience who could knock out assembly versions of those math functions.
 
Very cool! I've been checking out stuff about Doom8088 (not run it yet) and WolfensteinCGA this weekend. WolfensteinCGA is playable on my NEC v30 / 8086.
Trying to see if he can research if there is any V20/V30 speed improvements for the ASM side of things.

Picked up A book locally... 8086/8088/80286 Assembly Language "Revised and Explanded" ... which has been interesting. I only did teeny bits of ASM back in the 90s in high school with Borland Pascal.
Recently porting some code from C to Pascal for doing network related coding, I found that ChatGPT-4 was pretty decent, if you tell it constantly what CPU target type, and optimizing some ASM and C code i've ran into online.
 
I'm thinking about compiling FixedMul and FixedDiv with gcc-ia16 and using its assembly output in Doom8088.
 
I don't see the point. The only reason for such a "port" would be to run it on a 286 (since any 386 and up can run the original code). The fastest 286 ever made was the 25 MHz one by Harris. DOOM will never run any good on that, unless you compromise graphics until it looks worse than Wolfenstein 3D.

But well, good luck anyway. :)
 
Very cool! I've been checking out stuff about Doom8088 (not run it yet) and WolfensteinCGA this weekend. WolfensteinCGA is playable on my NEC v30 / 8086.
Trying to see if he can research if there is any V20/V30 speed improvements for the ASM side of things.

Picked up A book locally... 8086/8088/80286 Assembly Language "Revised and Explanded" ... which has been interesting. I only did teeny bits of ASM back in the 90s in high school with Borland Pascal.
Recently porting some code from C to Pascal for doing network related coding, I found that ChatGPT-4 was pretty decent, if you tell it constantly what CPU target type, and optimizing some ASM and C code i've ran into online.

Interesting. I used chatgpt today to help me fix a bug with some file writing code for debugging earlier today. I've never thought to try it for assembly. I might give it a try. I could also study some x86 assembler, but it won't necessarily make me any good at optimization. I feel like I remember hearing there are old books or libraries with lots of optimized asm rotines for different operations somewhere out there.

I'm thinking about compiling FixedMul and FixedDiv with gcc-ia16 and using its assembly output in Doom8088.
Interesting, I guess there's no reason one cant use other compilers to generate code for specific functions. I'm guessing there can also be versions of FixedDiv/FIxedMul that can be optimized for when one or both arguments are known to be 16 instead of 32 bit.


I found out something interesting just now. 86box (set up as a late 90s machine with EMM386) can run the 16 bit executable way way faster than dosbox can, at about 10-15 fps in timedemos. I don't know that this says anything about real world performance, but it will make debugging a lot easier for me right now.
 
Before you start optimizing with assembly, I recommend profiling your code to identify the actual bottlenecks. Otherwise you are likely to waste time on insignificant things.
 
Before you start optimizing with assembly, I recommend profiling your code to identify the actual bottlenecks. Otherwise you are likely to waste time on insignificant things.
With all the for loops in the source it looks ( 10 seconds of research ) , Loop unrolling in places could help maybe?

Ive used ChatGPT to do some loop unrolling on some old C and Pascal code playing around recently.

Yes, makes for bigger source but then I just move that source code of rolled loops to its own file and do an Include in the old spot.
{$I processLoop.PAS}
And the code works and moves on with life.
I guess trick would be to make sure .exe doesn't get TO big to where old hardware/DOS can't handle loading the size.... unloop code that are the bottle necks?
 
I was able to run timedemo 2 on 86box set up as a Celeron 533. There are a couple texture bugs, but it's otherwise stable.

tube]sjIFxr6fVPU[/MEDIA]

With all the for loops in the source it looks ( 10 seconds of research ) , Loop unrolling in places could help maybe?

There are some render functions where loop unrolling is known to help, though. I think the unrolled and regular functions both exist in the original codebase. For the average loop, it might be more worth saving the conventional memory. Definitely needs to be tested case-by-case. Kind of related, I'd also be interested in testing the time vs space compiler optimization flags on a file-by-file basis to see if compiling for space is ever worth it in 16-bit. The 32 bit openwatcom compiler seems to get almost twice-as-fast code for 10-20% bigger code or so though.

I guess trick would be to make sure .exe doesn't get TO big to where old hardware/DOS can't handle loading the size.... unloop code that are the bottle necks?

Binary being too big to load into memory is no longer a problem... it was maybe a problem 1-2 months ago. I actually committed the linker map files over time so you can see the conventional memory usage reported by the compiler going down over time. Depending on flags it can be as low as 450k or as high as 570k depending of runtime vs static allocations. I think we'll get down to 500k or so for the standard build, and then the game will malloc whatever it can to put more important variables.

On top of that, Openwatcom also supports overlays, and I can use them to cut conventional memory size by 10-20k, maybe more if I really go hard at it. Until the build is more stable, I don't really want to experiment. Debugging buggy code and adding overlays is a nightmare. :)

All that said I do think ASM improvements will be a relatively small improvement in performance. I know from 32-bit vanilla doom that the render functions being written in ASM introduce 30% improvement or so in FPS. Probably more with FASTDOOM improvements. I know that the improvements from reducing EMS pagination can lead to 2-5x improvement in speed. I am very curious about benchmarking what math ASM improvements can do though.
 
I don't see the point. The only reason for such a "port" would be to run it on a 286 (since any 386 and up can run the original code). The fastest 286 ever made was the 25 MHz one by Harris. DOOM will never run any good on that, unless you compromise graphics until it looks worse than Wolfenstein 3D.

But well, good luck anyway. :)
Doom on a 286 is "one of the most important milestone in Doom porting ever. Countless people have upgraded their 286s in the past for one reason - Doom didn't run on them.
And we're all know that basically 286 has practically the same power than 386 at the same frequency, so with a fast VGA card it should be more than playable."
source

Growing up I had a 286 with VGA and no EMS. Now I'm curious if it's possible to create Doom for such a computer with the aid of a modern C compiler.

Before you start optimizing with assembly, I recommend profiling your code to identify the actual bottlenecks. Otherwise you are likely to waste time on insignificant things.
The original Doom had 7 functions in assembly:
R_DrawColumn, R_DrawColumnLow, R_DrawSpan, R_DrawSpanLow
FixedMul, FixedDiv2
I_ReadJoystick
Without having profiled it, I guess those functions also benefit from assembly in a 16-bit version of Doom.

Doom8088 doesn't use assembly right now. FixedMul and FixedDiv are the only two places in Doom8088 where int64_t is used. This prevents me from compiling with Digital Mars, because that compiler doesn't support int64_t. So I want to rewrite those functions so I can check whether Digital Mars produces faster code than Open Watcom, something dccb claims. I haven't checked Microsoft C/C++ v8 yet.
BTW, Digital Mars has built-in support for EMS.
Doom8088 uses the large memory model, so I can't compile it with gcc-ia16, because that compiler doesn't support that memory model.
 
Last edited:
Profiling is still important now that the code has changed. Suppose EMS page mapping takes 80% of the render time...you'd be optimizing in the wrong place by looking at math functions first.

Also remember most people did not have a fast 286. So the reason for upgrading was not just 286->386. Whatever you end up with that is playable on a 12 mhz 286, probably won't be Doom as we know it.
 
Doom88 for IBM PC XT to Real Mode, - good.
The System requires: RAM 640 kb, VGA 256 kb, and MS-DOS v 3.30 ?

 
Hasn't stopped people from running it on a pregnancy tester ......
The screen was displaying Doom. It was not "running" on the pregnancy tester.

As Turing explains, the test's "existing CPU can't be reprogrammed and the existing LCD can only show 4 things, so I had to replace both to make any changes. And the current version doesn't even fit into the shell! (although I'm certain it will when complete)."
 
The screen was displaying Doom. It was not "running" on the pregnancy tester.

Man, I got deja vu so hard that it hurt when I saw the pregnancy tester mentioned. "Hey, wait, didn't I debunk that only a couple weeks ago? Is this the same thread?!?! No, thank garsh.

If I stick my Raspberry Pi into my toaster am I allowed to say I ported Doom to it? Because, yeah, that pregnancy tester thing is exactly that fake.

... and yeah, to be clear, it's not even that it's "running" Doom on the replacement guts; even the "playable" version (originally it was just playing a canned video) is running the engine remotely, the video is getting streamed via bluetooth to the AdaFruit Trinket M0 (a 48mhz 32 bit ARM) behind the OLED display. The CPU in that device probably *could* run Doom if it had enough memory (it's roughly 486-class), but it only has 32K of ROM and 256K of flash so, yeah, not happening.
 
Last edited:
Doom88 for IBM PC XT to Real Mode, - good.
The System requires: RAM 640 kb, VGA 256 kb, and MS-DOS v 3.30 ?

EMS will be necessary - I don't know yet how much it will need but maybe 2-3 MB will be enough, 4 will be enough for sure. (It would be cool if it worked on just 2 MB to play nice with the lo-tech EMS card).
Of course, it will not run fast on an IBM PC XT, but it will run, eventually.

Speaking of EMS, I've been testing the EMS implementation on real hardware and I keep running into problems. I don't know if its due to trying to make thousands of EMS page swaps a second or what... Down the road this number should go down a whole lot though. For now I'm going to focus on continuing to fix bugs in the 16 bit version, and once more optimizations are in place I'll revisit EMS on real hardware.
 
The screen was displaying Doom. It was not "running" on the pregnancy tester.
Running.. Showing.. whatever. It wasn't the original look and feel is what i am getting at.

Let people have their fun. Even if 286 flavor does work to a degree that is playable and looks good, who is it going to hurt.

This was in the response of "I don't see the point." .... what is the point of anything.
 
I fixed a few bugs today, and now it's running pretty okay on real hardware. An MMX-233 or so is good enough to get a comfortable framerate. I also just ran this on a 20 MhZ 286:



Well, it crashed in the end, but it ran for a bit. The bugfixing phase is hopefully almost over - I can't wait to focus on performance improvements.
 
EMS will be necessary - I don't know yet how much it will need but maybe 2-3 MB will be enough, 4 will be enough for sure. (It would be cool if it worked on just 2 MB to play nice with the lo-tech EMS card).
Of course, it will not run fast on an IBM PC XT, but it will run, eventually.

Speaking of EMS, I've been testing the EMS implementation on real hardware and I keep running into problems. I don't know if its due to trying to make thousands of EMS page swaps a second or what... Down the road this number should go down a whole lot though. For now I'm going to focus on continuing to fix bugs in the 16 bit version, and once more optimizations are in place I'll revisit EMS on real hardware.
Processors 8086 or 8088 address 1 MB memory only in real mode.
Need 80286 processor for 4 MB memory in protected mode, and MS-dos 4.0 or high.
 
Running.. Showing.. whatever. It wasn't the original look and feel is what i am getting at.

If the discussion is about running DOOM under tight hardware constraints (potato-level detail or not) then citing an example where DOOM is being *rendered* by a multi-GHz CPU and only *streamed* to the limited platform *does* seem to me to a significant, and disqualifying, problem.

Anyway, it also gets under my skin that people keep calling it a “Pregnancy Test” at all because the only original part is the plastic shell. So even that is a lie. People are *actually* running DOOM on ESP32s connected to full color SPI LCDs, with the result tiny enough to fit in a Tic-Tac box, but nobody is claiming that Tic-Tacs can run DOOM. (But if they did it would in fact be more accurate than the pregnancy test claim.)
 
Processors 8086 or 8088 address 1 MB memory only in real mode.
Need 80286 processor for 4 MB memory in protected mode, and MS-dos 4.0 or high.

Yes, the name of the project is RealDOOM, and the game runs in Real Mode. LIM EMS was created as a way to use more than 1 MB of memory in Real Mode via page swapping/remapping techniques.
 
Back
Top