• Please review our updated Terms and Rules here

Wolf3D hacked for 8086/8088 CPUs

Fallo

Experienced Member
Joined
Jan 2, 2009
Messages
432
that would actually take some pretty extreme modification, would be interesting though. CGA Wolf3D.

*shudders*

It couldn't be that bad-looking if you add composite support. . .

On a different subject, when using shift instructions on the 8086, you do:

MOV CL,4
SHL BX,CL (only works with CL)

and not:

SHL BX,1
SHL BX,1
SHL BX,1
SHL BX,1
 

Mike Chambers

Veteran Member
Joined
Sep 2, 2006
Messages
2,621
It couldn't be that bad-looking if you add composite support. . .

On a different subject, when using shift instructions on the 8086, you do:

MOV CL,4
SHL BX,CL (only works with CL)

and not:

SHL BX,1
SHL BX,1
SHL BX,1
SHL BX,1

actually, that's exactly what i did the first time around but it didn't work. the problem ended up being related to something else, and i forgot to change it back to CL. thanks for reminding me! that should help noticeably i think. i'll try it again.

iirc, i need to PUSH cx first and POP cx when done though. i wonder how many clock ticks that adds.
 

Mike Chambers

Veteran Member
Joined
Sep 2, 2006
Messages
2,621
also, i just added a couple more options to my cheat menu.

wolf3dhack2.png




tomorrow i'll do the SHL/SHR _,CL optimization and upload the new EXE w/ the menu. all my cheats are tested and working. :)
 

Mike Chambers

Veteran Member
Joined
Sep 2, 2006
Messages
2,621
alright go back to the first post and re-download. i've uploaded the last update i will probably do. i did the SHL/R _,CL optimization (haven't noticed a speed increase though) and the cheat menu is in there.
 

Jorg

Veteran Member
Joined
Aug 31, 2003
Messages
1,322
Location
Switzerland
Nice work :)



That's why I choose to make the port to the v20/v30's.. they handle SHL with multiple shifts a time.. Those bunch of shr xx,1 ... would overkill a 8088...

Do you actually have that V20 port somewhere?
Because I just remember that my XT has a V20 installed.
(and a 8087)
 

southbird

Experienced Member
Joined
Sep 11, 2009
Messages
316
Yeah, that'd be the problem with a PUSH / SHIFT / POP thing is the cycle counts probably blow away any "optimization" you may have earned originally. Another idea is to try a less apparent operation to make up for a shift, if the shift is greater than four bits. Now, I've been doing 6502 asm as late, and haven't ever really done 8086 assembler (though I mean to get into that sooner than later!), but I'm thinking if you needed to do something like a logical shift right six bits, you could use a rotate left instruction accompanied by an AND instead...

Starting Value: 11011100
Target: Shifted right six bits = 00000011

11011100 -> ROL -> 10111001 -> ROL -> 01110011
01110011 -> AND #3 -> 00000011

So two ROLs and an AND, instead of 6 SHRs (as apparently is the 8086 inst.) This should also work in reverse for a large left shift.

That's just an example, I don't know what kind of shifts you're dealing with. But if the amount is greater than 4, and it's a LOGICAL not arithmetic shift, this type of trick will prove valuable. Saving, Loading, Restoring a register probably isn't going to net that much performance.

Also, you mentioned a 768 byte copy. Is that done every frame or just in a while? Because if it's every frame, you MAY get a little bit of performance boost by unrolling that loop. (E.g. write 8 bytes 96 times instead of 1 byte 768 times.)


Finally, porting to EGA or CGA -- shouldn't be IMPOSSIBLE, though I don't know if the "snow" problems of a CGA might cause trouble. The big deal is you'll pretty much have to rewrite the rendering code to deal with the different byte packing in 16 (2 pixels per byte) or 4 (4 pixels per byte) colors. That would be some pretty serious reprogramming. But, if you're dedicated enough... :)
 

Mike Chambers

Veteran Member
Joined
Sep 2, 2006
Messages
2,621
Yeah, that'd be the problem with a PUSH / SHIFT / POP thing is the cycle counts probably blow away any "optimization" you may have earned originally. Another idea is to try a less apparent operation to make up for a shift, if the shift is greater than four bits. Now, I've been doing 6502 asm as late, and haven't ever really done 8086 assembler (though I mean to get into that sooner than later!), but I'm thinking if you needed to do something like a logical shift right six bits, you could use a rotate left instruction accompanied by an AND instead...

Starting Value: 11011100
Target: Shifted right six bits = 00000011

11011100 -> ROL -> 10111001 -> ROL -> 01110011
01110011 -> AND #3 -> 00000011

So two ROLs and an AND, instead of 6 SHRs (as apparently is the 8086 inst.) This should also work in reverse for a large left shift.

That's just an example, I don't know what kind of shifts you're dealing with. But if the amount is greater than 4, and it's a LOGICAL not arithmetic shift, this type of trick will prove valuable. Saving, Loading, Restoring a register probably isn't going to net that much performance.

Also, you mentioned a 768 byte copy. Is that done every frame or just in a while? Because if it's every frame, you MAY get a little bit of performance boost by unrolling that loop. (E.g. write 8 bytes 96 times instead of 1 byte 768 times.)


Finally, porting to EGA or CGA -- shouldn't be IMPOSSIBLE, though I don't know if the "snow" problems of a CGA might cause trouble. The big deal is you'll pretty much have to rewrite the rendering code to deal with the different byte packing in 16 (2 pixels per byte) or 4 (4 pixels per byte) colors. That would be some pretty serious reprogramming. But, if you're dedicated enough... :)

that's a good point about the shifts, but i haven't seen one that does more than 4 or 5 at a time. most of them are 2 or 3 at a time, and those i didn't even bother replacing because of the PUSH / POP you have to add onto it.

and yeah that's why i didn't want to mess with EGA / CGA. the byte packing. it would be pretty interesting to see though. then i could try it on my supersport 8088 laptop which has a CGA built-in. i'd like to see that because the 8088 in it can be run at a turbo mode of 7.16 MHz instead of 4.77. the byte packing would eat up some CPU though.

the palette code is only used when fading in/out and the menu's red fades.
 
Last edited:

southbird

Experienced Member
Joined
Sep 11, 2009
Messages
316
and yeah that's why i didn't want to mess with EGA / CGA. the byte packing. it would be pretty interesting to see though. then i could try it on my supersport 8088 laptop which has a CGA built-in. i'd like to see that because the 8088 in it can be run at a turbo mode of 7.16 MHz instead of 4.77. the byte packing would eat up some CPU though.

Well, it wouldn't necessarily "eat up" any CPU if you rewrite the renderer from scratch for the new target. :) That's unfortunately about the only "sane" way to do it. I do have a description of a raycast renderer like it uses in a textbook somewhere... don't know how clear it is in the source... but the best performance would likely come from a new renderer.

the palette code is only used when fading in/out and the menu's red fades.

I figured as much. It would only be every cycle if it were doing palette animations or something.
 

Mike Chambers

Veteran Member
Joined
Sep 2, 2006
Messages
2,621
Well, it wouldn't necessarily "eat up" any CPU if you rewrite the renderer from scratch for the new target. :) That's unfortunately about the only "sane" way to do it. I do have a description of a raycast renderer like it uses in a textbook somewhere... don't know how clear it is in the source... but the best performance would likely come from a new renderer.



I figured as much. It would only be every cycle if it were doing palette animations or something.

oh i'm pretty familiar with raycaster coding, i wrote this recently in freebasic from scratch:

http://www.youtube.com/watch?v=CaUzyNQgUgc

a couple minor graphical glitches, still working on it.

i'm just not good enough with ASM to write one that'll perform well on an 8088. or write one in ASM at all. :p
 

Andretti

Experienced Member
Joined
Sep 10, 2009
Messages
53
Location
Toronto, Ontario
oh i'm pretty familiar with raycaster coding, i wrote this recently in freebasic from scratch

That's pretty neat, was it running on an 8088? For BASIC, it was fairly smooth.

You know, not many people would create a graphic program totally from assembly. When I use to play around with it, I built up a library of many small routines I could reuse. Anything time critical I used assembly with QuickBasic providing the front end. It was a great environment to work in.
 

Mike Chambers

Veteran Member
Joined
Sep 2, 2006
Messages
2,621
That's pretty neat, was it running on an 8088? For BASIC, it was fairly smooth.

You know, not many people would create a graphic program totally from assembly. When I use to play around with it, I built up a library of many small routines I could reuse. Anything time critical I used assembly with QuickBasic providing the front end. It was a great environment to work in.

oh heck no, that's on a 3.4 GHz pentium 4 with windows 7. :p

the raycaster is much smoother than that video makes it look though, it's because of camstudio when i captured it. get around 60-70 FPS. (which isn't that great considering the system i'm running on, and what it is)

QuickBASIC isn't a bad system to work with, obviously i use it plenty with the TCP stuff and whatnot, but the compiler generates relatively slow code but yes with some ASM compiled OBJ files linked in, it can get pretty powerful.

take a look at freebasic though if you like BASIC. it's amazing. it can compile for windows and linux. http://www.freebasic.net
 

be3

New Member
Joined
Sep 20, 2019
Messages
1
oh heck no, that's on a 3.4 GHz pentium 4 with windows 7. :p

the raycaster is much smoother than that video makes it look though, it's because of camstudio when i captured it. get around 60-70 FPS. (which isn't that great considering the system i'm running on, and what it is)

QuickBASIC isn't a bad system to work with, obviously i use it plenty with the TCP stuff and whatnot, but the compiler generates relatively slow code but yes with some ASM compiled OBJ files linked in, it can get pretty powerful.

take a look at freebasic though if you like BASIC. it's amazing. it can compile for windows and linux. http://www.freebasic.net

Hello, Mike Chambers, could you share source of 8086/8088 Wolf3D?
 
Top