• Please review our updated Terms and Rules here

Wolf3D hacked for 8086/8088 CPUs

that would actually take some pretty extreme modification, would be interesting though. CGA Wolf3D.

*shudders*

It couldn't be that bad-looking if you add composite support. . .

On a different subject, when using shift instructions on the 8086, you do:

MOV CL,4
SHL BX,CL (only works with CL)

and not:

SHL BX,1
SHL BX,1
SHL BX,1
SHL BX,1
 
It couldn't be that bad-looking if you add composite support. . .

On a different subject, when using shift instructions on the 8086, you do:

MOV CL,4
SHL BX,CL (only works with CL)

and not:

SHL BX,1
SHL BX,1
SHL BX,1
SHL BX,1

actually, that's exactly what i did the first time around but it didn't work. the problem ended up being related to something else, and i forgot to change it back to CL. thanks for reminding me! that should help noticeably i think. i'll try it again.

iirc, i need to PUSH cx first and POP cx when done though. i wonder how many clock ticks that adds.
 
also, i just added a couple more options to my cheat menu.

wolf3dhack2.png




tomorrow i'll do the SHL/SHR _,CL optimization and upload the new EXE w/ the menu. all my cheats are tested and working. :)
 
alright go back to the first post and re-download. i've uploaded the last update i will probably do. i did the SHL/R _,CL optimization (haven't noticed a speed increase though) and the cheat menu is in there.
 
Nice work :)



That's why I choose to make the port to the v20/v30's.. they handle SHL with multiple shifts a time.. Those bunch of shr xx,1 ... would overkill a 8088...

Do you actually have that V20 port somewhere?
Because I just remember that my XT has a V20 installed.
(and a 8087)
 
Yeah, that'd be the problem with a PUSH / SHIFT / POP thing is the cycle counts probably blow away any "optimization" you may have earned originally. Another idea is to try a less apparent operation to make up for a shift, if the shift is greater than four bits. Now, I've been doing 6502 asm as late, and haven't ever really done 8086 assembler (though I mean to get into that sooner than later!), but I'm thinking if you needed to do something like a logical shift right six bits, you could use a rotate left instruction accompanied by an AND instead...

Starting Value: 11011100
Target: Shifted right six bits = 00000011

11011100 -> ROL -> 10111001 -> ROL -> 01110011
01110011 -> AND #3 -> 00000011

So two ROLs and an AND, instead of 6 SHRs (as apparently is the 8086 inst.) This should also work in reverse for a large left shift.

That's just an example, I don't know what kind of shifts you're dealing with. But if the amount is greater than 4, and it's a LOGICAL not arithmetic shift, this type of trick will prove valuable. Saving, Loading, Restoring a register probably isn't going to net that much performance.

Also, you mentioned a 768 byte copy. Is that done every frame or just in a while? Because if it's every frame, you MAY get a little bit of performance boost by unrolling that loop. (E.g. write 8 bytes 96 times instead of 1 byte 768 times.)


Finally, porting to EGA or CGA -- shouldn't be IMPOSSIBLE, though I don't know if the "snow" problems of a CGA might cause trouble. The big deal is you'll pretty much have to rewrite the rendering code to deal with the different byte packing in 16 (2 pixels per byte) or 4 (4 pixels per byte) colors. That would be some pretty serious reprogramming. But, if you're dedicated enough... :)
 
Yeah, that'd be the problem with a PUSH / SHIFT / POP thing is the cycle counts probably blow away any "optimization" you may have earned originally. Another idea is to try a less apparent operation to make up for a shift, if the shift is greater than four bits. Now, I've been doing 6502 asm as late, and haven't ever really done 8086 assembler (though I mean to get into that sooner than later!), but I'm thinking if you needed to do something like a logical shift right six bits, you could use a rotate left instruction accompanied by an AND instead...

Starting Value: 11011100
Target: Shifted right six bits = 00000011

11011100 -> ROL -> 10111001 -> ROL -> 01110011
01110011 -> AND #3 -> 00000011

So two ROLs and an AND, instead of 6 SHRs (as apparently is the 8086 inst.) This should also work in reverse for a large left shift.

That's just an example, I don't know what kind of shifts you're dealing with. But if the amount is greater than 4, and it's a LOGICAL not arithmetic shift, this type of trick will prove valuable. Saving, Loading, Restoring a register probably isn't going to net that much performance.

Also, you mentioned a 768 byte copy. Is that done every frame or just in a while? Because if it's every frame, you MAY get a little bit of performance boost by unrolling that loop. (E.g. write 8 bytes 96 times instead of 1 byte 768 times.)


Finally, porting to EGA or CGA -- shouldn't be IMPOSSIBLE, though I don't know if the "snow" problems of a CGA might cause trouble. The big deal is you'll pretty much have to rewrite the rendering code to deal with the different byte packing in 16 (2 pixels per byte) or 4 (4 pixels per byte) colors. That would be some pretty serious reprogramming. But, if you're dedicated enough... :)

that's a good point about the shifts, but i haven't seen one that does more than 4 or 5 at a time. most of them are 2 or 3 at a time, and those i didn't even bother replacing because of the PUSH / POP you have to add onto it.

and yeah that's why i didn't want to mess with EGA / CGA. the byte packing. it would be pretty interesting to see though. then i could try it on my supersport 8088 laptop which has a CGA built-in. i'd like to see that because the 8088 in it can be run at a turbo mode of 7.16 MHz instead of 4.77. the byte packing would eat up some CPU though.

the palette code is only used when fading in/out and the menu's red fades.
 
Last edited:
and yeah that's why i didn't want to mess with EGA / CGA. the byte packing. it would be pretty interesting to see though. then i could try it on my supersport 8088 laptop which has a CGA built-in. i'd like to see that because the 8088 in it can be run at a turbo mode of 7.16 MHz instead of 4.77. the byte packing would eat up some CPU though.

Well, it wouldn't necessarily "eat up" any CPU if you rewrite the renderer from scratch for the new target. :) That's unfortunately about the only "sane" way to do it. I do have a description of a raycast renderer like it uses in a textbook somewhere... don't know how clear it is in the source... but the best performance would likely come from a new renderer.

the palette code is only used when fading in/out and the menu's red fades.

I figured as much. It would only be every cycle if it were doing palette animations or something.
 
Well, it wouldn't necessarily "eat up" any CPU if you rewrite the renderer from scratch for the new target. :) That's unfortunately about the only "sane" way to do it. I do have a description of a raycast renderer like it uses in a textbook somewhere... don't know how clear it is in the source... but the best performance would likely come from a new renderer.



I figured as much. It would only be every cycle if it were doing palette animations or something.

oh i'm pretty familiar with raycaster coding, i wrote this recently in freebasic from scratch:

http://www.youtube.com/watch?v=CaUzyNQgUgc

a couple minor graphical glitches, still working on it.

i'm just not good enough with ASM to write one that'll perform well on an 8088. or write one in ASM at all. :p
 
oh i'm pretty familiar with raycaster coding, i wrote this recently in freebasic from scratch

That's pretty neat, was it running on an 8088? For BASIC, it was fairly smooth.

You know, not many people would create a graphic program totally from assembly. When I use to play around with it, I built up a library of many small routines I could reuse. Anything time critical I used assembly with QuickBasic providing the front end. It was a great environment to work in.
 
That's pretty neat, was it running on an 8088? For BASIC, it was fairly smooth.

You know, not many people would create a graphic program totally from assembly. When I use to play around with it, I built up a library of many small routines I could reuse. Anything time critical I used assembly with QuickBasic providing the front end. It was a great environment to work in.

oh heck no, that's on a 3.4 GHz pentium 4 with windows 7. :p

the raycaster is much smoother than that video makes it look though, it's because of camstudio when i captured it. get around 60-70 FPS. (which isn't that great considering the system i'm running on, and what it is)

QuickBASIC isn't a bad system to work with, obviously i use it plenty with the TCP stuff and whatnot, but the compiler generates relatively slow code but yes with some ASM compiled OBJ files linked in, it can get pretty powerful.

take a look at freebasic though if you like BASIC. it's amazing. it can compile for windows and linux. http://www.freebasic.net
 
oh heck no, that's on a 3.4 GHz pentium 4 with windows 7. :p

the raycaster is much smoother than that video makes it look though, it's because of camstudio when i captured it. get around 60-70 FPS. (which isn't that great considering the system i'm running on, and what it is)

QuickBASIC isn't a bad system to work with, obviously i use it plenty with the TCP stuff and whatnot, but the compiler generates relatively slow code but yes with some ASM compiled OBJ files linked in, it can get pretty powerful.

take a look at freebasic though if you like BASIC. it's amazing. it can compile for windows and linux. http://www.freebasic.net

Hello, Mike Chambers, could you share source of 8086/8088 Wolf3D?
 
Good Day, Everyone!

I am having some serious issues getting Wolfenstein 3D running on my Juko ST based XT clone sporting a NEC V20, Intel 8087-1, and TexElec XT-IDE (SD Card).

I have tried running both the wolf8086.exe and 8087v20.exe files. I have tried multiple VGA video cards that run in an 8 bit slot, but all are Oak OTI-037 based, so that doesn't help narrowing down any potential video chipset incompatibilities when they are all the same. As I legally own multiple copies of Wolfenstein 3D, I have tried various versions/revisions of the software, to include shareware versions.

No matter whether I run the wolf8086.exe or 8087v20.exe, nor version/revision of Wolfenstein 3D software I run, It starts out strong, and then devolves into the graphical glitching seen below for the rest of my session. The last picture, 11.jpg, is when you exit the game. Instead of dumping me back to a DOS prompt, I get that instead. For reference, I have PM'd Mike Chambers a few weeks ago regarding Wolf8086, but received no response as of yet, so I thought I would try here. I also see JoJo_ReloadeD has answered within this post, albeit a long time ago, so instead of a another PM, we'll give this post a go so I can ask the community at large for their help.

Thank you in advance for any assistance!
 

Attachments

  • 2.jpg
    2.jpg
    4.3 MB · Views: 11
  • 3.jpg
    3.jpg
    4.1 MB · Views: 8
  • 4.jpg
    4.jpg
    5.2 MB · Views: 7
  • 5.jpg
    5.jpg
    4.3 MB · Views: 7
  • 6.jpg
    6.jpg
    3.4 MB · Views: 5
  • 7.jpg
    7.jpg
    4.4 MB · Views: 8
  • 8.jpg
    8.jpg
    3.3 MB · Views: 8
  • 9.jpg
    9.jpg
    3.7 MB · Views: 9
  • 10.jpg
    10.jpg
    3.6 MB · Views: 9
  • 11.jpg
    11.jpg
    4.5 MB · Views: 11
IIRC Wolfenstein 3D uses "Mode Y" (like Mode X but 320x200). Possibly the OTI-037 has issues with that.
 
Back
Top