• Please review our updated Terms and Rules here

OpenWatcom 1.9-built binaries lock up on 8088 without FPU

Scali

Veteran Member
Joined
Dec 13, 2014
Messages
2,024
Location
The Netherlands
I've been toying with OpenWatcom as a cross-compiler for my old IBM 5160... and I ran into a problem.
It seemed that as soon as I pulled in any code that used floats, it would lock up, even before it ever got to the main() of my program.
I could not reproduce this problem in PCem, DOSBox or any of real PCs.

I have been doing some debugging on the real metal, and eventually I found out that the problem is in the initialization of the FPU emulation routines, a function called __Fini_FPE_handler, which can be found in the file fpe87.asm.
The problem here is that it is assembled with some fwait-instructions. Since there is no actual FPU present to signal that the bus is free, fwait locks the bus indefinitely, locking up the machine.
I have also found some information on this here: http://www.openwatcom.org/index.php/Math_Coprocessor_Interface_Evolution
The WAIT/FWAIT instruction thus implements the software-visible portion of the synchronization mechanism. Note that FWAIT is the same instruction as WAIT, and it is a CPU instruction despite the 'F' prefix. The difference between WAIT and FWAIT is that the latter can be eliminated by the linker if an emulation library is used.

Now, apparently these fwaits should have been removed by the linker (I have tried to build it with various settings for FPU emulation), but they were not.
So, I went in there with the ole hex-editor and patched the fwaits to nops in my clibs.lib.
This fixed the lockup, and apparently float-code seemed to execute correctly afterwards, so it seems the rest of the code is working as it should.

However, I was wondering if other people have encountered this problem as well, and perhaps have a better workaround than this (I may have missed some linker switch somehow?).
 
Upon closer inspection, the code it is running is installing an FPE handler... Which it shouldn't be doing in the first place. Apparently this code is not fully compatible with the emulation settings.
When I looked at the top of fpe87.asm, I also noticed this:
.8087
.286

The .286 directive may be a clue as to why the fwaits were ignored by the linker... On 286 the emulation works in a different way.
I will have to investigate further.
 
The WAIT/FWAIT instruction thus implements the software-visible portion of the synchronization mechanism. Note that FWAIT is the same instruction as WAIT, and it is a CPU instruction despite the 'F' prefix. The difference between WAIT and FWAIT is that the latter can be eliminated by the linker if an emulation library is used.

That seems weird to me. WAIT and FWAIT are two different mnemonics for the same opcode (0x9b). Linkers work at the binary level, not the level of assembly source, so I would not have expected the linker to be able to tell the difference. Maybe the assembler makes a list of all the FWAITs and puts it somewhere else in the object file so the linker knows where all the FWAITs are for replacing them (saves it having to know all the instruction lengths and getting confused by non-code in the text section).
 
That seems weird to me. WAIT and FWAIT are two different mnemonics for the same opcode (0x9b). Linkers work at the binary level, not the level of assembly source, so I would not have expected the linker to be able to tell the difference. Maybe the assembler makes a list of all the FWAITs and puts it somewhere else in the object file so the linker knows where all the FWAITs are for replacing them (saves it having to know all the instruction lengths and getting confused by non-code in the text section).

Yes, that is my interpretation as well... FWAIT has special meaning to the assembler/compiler, so it may result in special info in the .obj.
I think it's rather strange though. From what I understand, 8088 is not capable of emulating x87 instructions through exceptions, because the instructions don't generate an exception to begin with.
Therefore, the code is generated with int-calls instead of actual FPU instructions, and the emulation routine is hooked into these ints. Once the emulator is triggered, it can patch the real FPU instructions back, if it detects an actual FPU.
The conclusion here is: there should not have been x87 instructions in my binary in the first place.
But I think this is merely a symptom. Namely, the code it is trying to run is installing an FPE handler. Which it shouldn't be doing in the first place, because it should have detected that I do not have an FPU.
So the code being in the binary is not necessarily the problem. The code being called on a platform without FPU, that is the problem.

I have compiled my code with -fpc, which means it emits calls directly to emulation routines, rather than using x87 instructions. This means that the actual application code does not contain any x87 at all, and it should work as expected.
The patches I've done on the fwaits are not really a solution, since it still 'executes' the FPU instructions (and it may break systems with a real FPU that need the fwait)... They are just sent to an empty FPU socket, so they are effectively nops.
This makes the initialization go through, and the app will run on my system...

But I guess what we really need to find out is: Why is it trying to init the FPE handler? I have not yet tracked the code down entirely (it seems to generate a list of init-routines in the clib, which are iterated at startup, making it difficult to see where these came from, and why), but it seems like it's missing a check somewhere.
I have seen other x87-code in the library, but none of it was ever called, so it should not be a problem in practice. If I can get a good check working in the FPE handler, things should be fine.

So if anyone has any idea where this check should go... and where it went in the first place (I assume it has been there at some point), that'd be nice :)

Edit: I noticed this line in the fpe87.asm file, in the proc __Init_FPE_handler:
cmp byte ptr __8087,0 ; - quit if no 80x87 present

Apparently there's the check, but it is not in __Fini_FPE_handler.
Perhaps they assumed that this check is enough:
cmp word ptr CS:Save87+2,0 ; - quit if not initialized

But somehow it isn't.
I'll have to see what the values of these two variables are.
Save87+2 should contain the segment of the old int2 handler. Might that be the problem? Perhaps even after it saved it, the segment value is 0? Unlikely though, but who knows.
More likely is that the __8087 value is not initialized properly. Which means it will install the __FPEHandler, even though it shouldn't.
Which means that the check for 0 to see if it was initialized does not cover the check for an 8087.

It seems the emulator overwrites __8087 as well...
So once the emulator is initialized, you can't tell the emulator from the real thing (which would work on 286 and higher, but not on 8088, which doesn't give exceptions for FPU instructions, so you can't hook your emulator in there and just execute real FPU code). So I think the order may just be wrong. If it initialized the clib first, it would see __8087 is 0, so it would skip the FPU code.
But if the emu is initialized first, it mistakenly thinks it's a real FPU, so it jumps to real FPU code.
There is also a __real8087. I think I may have to change the check to that.
 
Last edited:
Okay, I got it...
The problem was indeed with the __8087 value...
In the file ini87086.asm there is a detection routine for the FPU, which is broken.
It does this:
sub AX,AX
push AX ; allocate space for status word
finit ; use default infinity mode
fstcw word ptr [BP-2] ; save control word
fwait
pop AX
mov AL,0
cmp AH,3
jnz nox87

You can't do that on 8088, because finit is actually a wait+finit. The wait will lock up, because there's no 8087 installed to signal that the bus is free.
I looked through some of my old code, and found a routine of my own that was based on an Intel example.
So I do this instead:
fninit
mov AX,5A5Ah
push AX
fnstsw word ptr [BP-2]
pop AX
cmp AL, 0
mov AL, 0
jnz nox87

The fninit is a 'bare' finit, without a wait prefix.
If there is no 8087, it will just do nothing.
Then the fnstsw will store the status word, if there is an 8087... Else it does nothing.
After the finit, the status word should always be 0. So, if the value is now 0, we have an FPU, else we don't.

With this fixed clib, the code should work again on all CPUs, with or without FPU.

You can find my pre-built libs here: http://bohemiq.scali.eu.org/Watcomclibc8088.zip
Replace the libs in lib286\dos with these, and you should be good to go.
 
Last edited:
Small update: I've been using my own libs for a while now, and both the -fpc (direct calls to FPU emulation) and the -fpi (emit int call + x87 opcode) compiler options work fine on 8088 without FPU, and on machines with FPU (I have no 8088 with FPU, but I assume that works as well).
I have reported this to the OpenWatcom V2 fork as well, and they have made their own fix for this issue: https://github.com/open-watcom/open-watcom-v2/commit/40dfa157a1f419a477a04392473d188a53495966
 
Back
Top