16-bit Watcom C and memory models

eeguru · Sep 1, 2017

I'm getting increasingly frustrated with Watcom C in generating code for a real-mode execution environment. I'm trying to generate utility sets for various projects that will run on 8088 on up. I keep hitting weird ghost like issue with different projects that driving up frustrations and I am wondering if there is a source of information that could quickly ramp up my knowledge base of things to do and don't do in different memory models. Here are some examples of my frustrations:

1) I have an utility that traps an timer ISR using _dos_getvect and _dos_setvect. The timer ISR only increments a global static uint32_t count which I can print from main. If I compile with -ms, it works. If I compile with -ml, it doesn't work.. WTF?

2) I have another set of utilities with common code in a library. If I place printfs inside the library, none of them work if I compile the library and utilities with -ms. I have to compile both with -ml before things like printf work consistently... WTF?

I'm pulling my hair out trying to understand where these problems are coming from without having to chew through a bunch of generated assembly to root cause. Are there any resources I can consult for help?

PS. And no, I would rather not use Borland, MS, or other 16-bit compilers. I much prefer to cross compile under Linux without using an emulation environment. For reference, I am using "Open Watcom C x86 16-bit Optimizing Compiler. Version 2.0 beta"

mbbrutman · Sep 1, 2017

I wouldn't use that version of the compiler. Version 1.9 is the current tested version. 2.0 is reputed to have all sorts of poorly tested changes in it.

If you are careful and you specify far pointers in your interrupt service routines and the things that interface with them then your code should compile correctly no matter what memory model you used. I suspect you are having far vs. near pointer problems. Just be very explicit with those pointers, and when in doubt use far.

Also keep in mind that things pushed on the stack will also vary with the memory model that you use.

eeguru · Sep 1, 2017

Just tried 1.9 and both problems still persist. This is really frustrating and I'm not sure where to turn for help. I don't want to abandon either project, but I'm at my rope's end...

eeguru · Sep 1, 2017

If anyone has a Gravis Ultrasound (preferably Classic) and is willing to help debug a port of the SDK to Watcom 16-bit, let me know.

Chuck(G) · Sep 1, 2017

Exactly what is it that fails? Could it be something as simple as not getting your segment registers set up correctly? More information would be welcome.

Worst case, you might do it all with inline assembly.

daver2 · Sep 1, 2017

One thing that people fall over when using interrupts / multithreading and 'C' is the 'volatile' keyword (if it exists in the version you are using).

If variable 'count' can change under the feet of the code, you have to tell the compiler that by using the 'volatile' keyword so the optimisations don't kick in and mess you up. This can sometimes work and sometimes not...

Please tell us what your definitions of 'work' and 'not work' are in this context.

Dave

eeguru · Sep 1, 2017

It's hard to describe without posting a ton of code. I found a simple timer example and it works in both small and large memory models. The only different in what I am trying to do is in the ISR, I chain through two __far defined functions to a handler that increments a volatile uint32_t. In small it works, eg I get incrementing counts. In large, I always get zero.

eeguru · Sep 1, 2017

Here is the code base I'm working on. The example program tests the GUS timers. If I compile with -ms, it works. If I compile with -ml, it does not work (count always zero). Thanks in advance.

https://www.retrotronics.org/vcf/gussdk16.zip

reenigne · Sep 2, 2017

DOSBox has GUS support and a debugger. Can you get the -ms version to run correctly in DOSBox? If so, you could put a breakpoint at the interrupt routine in both -ms and -ml versions, step through, and see what they're doing differently. I tried doing this myself, but I've never done anything with GUS before. I made sure the defaults in UltraGetCfg() matched the ones in my DOSBox config file, but I still get "ERROR: No card found".

I took a look at the generated code, and it mostly looks ok. I did notice one potential problem: large model code assumes that the stack segment register is set to the correct segment for global/static data, but the interrupt handler code does not enforce this. If the IRQ occurs when the CPU is running code not from your program (i.e. OS/driver/BIOS/TSR code) then that code may have set up its own stack, breaking this assumption. So in the large memory model, IRQ handlers really need to switch to a stack in the correct segment (and switch it back before returning). It needs to be a second stack (since we don't know where the stack pointer got to in the primary one) and you have to leave interrupts disabled during the entire execution of the handler (otherwise another interrupt could occur and you'd need an unbounded sequence of stacks). I think this is a flaw when using interrupt functions and large memory model in Watcom. I don't think there's way to switch stacks in C, so you might need to write your own IRQ handler in assembly.

I'm also not 100% sure that this is the problem, since I don't know for sure if kbhit() or other IRQ handlers change SS. But it's the only potential problem I noticed looking at the code.

daver2 · Sep 2, 2017

I also had a quick look this morning.

I couldn't find the actual bit of the code where you were stating that the count was always zero when compiled with -ml. Can you identify a specific .c file and line number (or just post the code snippet and identify the file it came from).

My first thought through was exactly what reenigne had identified - what is the size of available stack when your ISR(s) are entered. if your code use too much stack (and the definition of 'too much' here is a bit variable as it is up to DOS - or whatever program is executing - when the interrupt occurs) then problems will ensue. This may also account for why things go awry when compiled with -ml - as the return addresses and anything else on the stack will be FAR pointers (i.e. use more stack space).

I would allocate my own stack when an ISR is entered and return the stack (SS/SP) back to what it was before exiting the ISR. This obviously means storing the SS/SP somewhere on entry. I would store SS/SP in two general-purpose registers on entry to the ISR, then allocate a new stack (of sufficient size for your use) and PUSH the old SS/SP onto the new stack. When you have finished your ISR, recover the two words from your stack and store into SS/SP - thus recovering the old stack. If memory serves me correctly - storing something into SS automatically means that the next instruction isn't interruptible (i.e. the stack segment/offset isn't in an undefined state). [EDIT: See http://c9x.me/x86/html/file_module_x86_id_176.html and search for 'inhibit'].

As has also been mentioned, interrupts should never be re-enabled until you have finished your ISR (unless your code is guaranteed to be re-enterable or can't generate interrupts itself until you have finished).

Dave

reenigne · Sep 2, 2017

daver2 said:
My first thought through was exactly what reenigne had identified - what is the size of available stack when your ISR(s) are entered. if your code use too much stack (and the definition of 'too much' here is a bit variable as it is up to DOS - or whatever program is executing - when the interrupt occurs) then problems will ensue. This may also account for why things go awry when compiled with -ml - as the return addresses and anything else on the stack will be FAR pointers (i.e. use more stack space).

I don't think it's lack of stack space that's the problem here - the three functions gf1_irq_handler, gf1_handler and HandleTimer1 probably use less than 60 bytes of stack space between them. Instead, I suspect that the interrupt routine (including HandleTimer1) is executing with the wrong value in SS, which breaks an assumption in the compiled code (that SS is 0x7fa paragraphs higher than CS).

daver2 said:
I would allocate my own stack when an ISR is entered and return the stack (SS/SP) back to what it was before exiting the ISR. This obviously means storing the SS/SP somewhere on entry. I would store SS/SP in two general-purpose registers on entry to the ISR, then allocate a new stack (of sufficient size for your use) and PUSH the old SS/SP onto the new stack.

The new stack has to be in the same segment as the original stack, so you have to allocate it statically (i.e. as a global array) rather than on the heap (using malloc or equivalent). How big it needs to be depends on what you're doing in the interrupt handler and its callees, but Watcom's default 2kB stack size should be more than enough.

daver2 said:
If memory serves me correctly - storing something into SS automatically means that the next instruction isn't interruptible (i.e. the stack segment/offset isn't in an undefined state). [EDIT: See http://c9x.me/x86/html/file_module_x86_id_176.html and search for 'inhibit'].

Except that this behaviour is buggy on some CPUs. But since interrupts should be off until after the IRET anyway, it doesn't matter in this case.

daver2 · Sep 2, 2017

I have not used the WATCOM C/C++ compiler before (WATCOM FORTRAN yes).

From what I can see in the documentation I have found on the internet, the registers are saved onto the stack that is in force at the time of the interrupt and DS reloaded so that access to program data can be made. Nothing is specifically mentioned about setting up SS/SP - so (my assumption would be) that the interrupt service routine is running with the 'interruptees' stack and (therefore) the size of the WATCOM stack that has been set-up is irrelevant? If you are saying that there is a requirement on the value held in SS then could you provide me with a reference to that (just for my interest). It seems somewhat 'strange' - to say the least - that there is an implied SS=CS+0x7fa paragraphs. From my reading of the documentation, the linker should store the BSS/STACK tagged data at the end of the executable program so it doesn't take up any physical size in the EXE file. It says nothing that I can find about 0x7fa paragraphs anywhere in the documentation I have read. It also 'breaks' Intel's recommendations about not performing arithmetic using segment registers (this was in preparation for protected mode if I remember correctly).

One man's bug is another man's implementation... I remember being on an Intel 286 assembler course in the early days (with a load of IBM engineers from the UK - but that's another story) where exactly the third point of the reference you gave was raised regarding multiple overrides on a repeat string instruction. The recommendation was to always check CX after the instruction and (if non zero) branch back to the start of the instruction (including any prefixes). I suspect this was how it worked at the time - and for subsequent processors/steppings Intel decided to change the way it performed (but in an upwardly compatible manner).

I must admit, it is 'daft' to change SS without interrupts being disabled - but I don't specifically remember anything being said about a bug (or at least not in the 8086/80286).

Interested to learn for myself.

It would be good to see the assembly code that has been generated by the compiler as the prolog to the _interrupt handler...

Dave

eeguru · Sep 2, 2017

Thanks for the feedback. I'm going to dig deeper today. I had assumed I would be running off the interruptee's stack as well - which is the program stack setup by Watcom itself. This isn't a TSR or BIOS routine - the interrupt is hooked at start of program execution and restored at the end.

reenigne · Sep 2, 2017

daver2 said:
From what I can see in the documentation I have found on the internet, the registers are saved onto the stack that is in force at the time of the interrupt and DS reloaded so that access to program data can be made. Nothing is specifically mentioned about setting up SS/SP - so (my assumption would be) that the interrupt service routine is running with the 'interruptees' stack and (therefore) the size of the WATCOM stack that has been set-up is irrelevant?

Correct.

daver2 said:
If you are saying that there is a requirement on the value held in SS then could you provide me with a reference to that (just for my interest). It seems somewhat 'strange' - to say the least - that there is an implied SS=CS+0x7fa paragraphs.

My reference is the assembler output of the compiler for eeguru's program. Here is gf1_irq_handler:

Code:

gf1_irq_handler_:
    push        ax
    push        cx
    push        dx
    push        bx
    push        sp
    push        bp
    push        si
    push        di
    push        ds
    push        es
    push        ax
    push        ax
    mov         bp,sp
    cld
    mov         ax,DGROUP:CONST
    mov         ds,ax
    mov         cx,word ptr __gf1_data+0aH
    mov         bx,cx
    shl         bx,1
    shl         bx,1
    add         bx,cx
    mov         al,byte ptr __gf1_irq+2[bx]
    xor         ah,ah
    mov         dx,ax
    mov         bl,byte ptr __gf1_irq+3[bx]
    xor         bh,bh
    mov         ax,bx
    call        far ptr outp_
    cmp         cx,7
    jle         L$27
    mov         dx,20H
    mov         ax,dx
    call        far ptr outp_
L$27:
    push        cs
    call        near ptr gf1_handler_
    pop         ax
    pop         ax
    pop         es
    pop         ds
    pop         di
    pop         si
    pop         bp
    pop         bx
    pop         bx
    pop         dx
    pop         cx
    pop         ax
    iret

As you can see, it does not modify SS before calling gf1_handler (which is a normal function). And here is HandleTimer1:

Code:

HandleTimer1_:
    mov         ax,4
    call        far ptr __STK
    mov         ax,word ptr ss:_count1
    mov         ax,word ptr ss:_count1+2
    add         word ptr ss:_count1,1
    adc         word ptr ss:_count1+2,0
    retf

The __STK function just checks for stack overflow, it doesn't modify SS or SP. As you can see, this function accesses count1 via an SS: override, i.e. it requires a particular value to be in SS such that the address encoded in the ADD and ADC instructions for count1 is correct. For this particular program (at least, when I compiled it) the required value of SS is 0x7fa paragraphs higher than CS (sorry if I mistakenly implied that value was universal and not specific to this program).

daver2 said:
From my reading of the documentation, the linker should store the BSS/STACK tagged data at the end of the executable program so it doesn't take up any physical size in the EXE file.

That's right.

daver2 said:
It also 'breaks' Intel's recommendations about not performing arithmetic using segment registers (this was in preparation for protected mode if I remember correctly).

Usually all the arithmetic is done in the startup code and/or as .EXE relocations. It's quite unavoidable to have some segment arithmetic somewhere when doing real-mode programming and accessing more than 64kB, though. I guess the recommendation is there to avoid having segment arithmetic in assembly code that might later need to be ported to protected mode.

reenigne · Sep 2, 2017

eeguru said:
Thanks for the feedback. I'm going to dig deeper today. I had assumed I would be running off the interruptee's stack as well - which is the program stack setup by Watcom itself. This isn't a TSR or BIOS routine - the interrupt is hooked at start of program execution and restored at the end.

Right, but pieces of DOS/BIOS code (in kbhit()) and other IRQs (and therefore potentially TSRs) do get executed during the running of your program (and therefore potentially get interrupted by your handler). Any of these could set up their own stack temporarily instead of using the program's stack that Watcom set up.

eeguru · Sep 2, 2017

Also is it reasonable to assume that if I chain to the previous handler, and assume other hardware IRQ routines will as well, that the last in chain will be a BIOS routine that performs a PIC EOI? ie. Is it save to remove my EOI processing in favor of a chain to the previous handler at the end of my ISR?

reenigne · Sep 2, 2017

eeguru said:
Also is it reasonable to assume that if I chain to the previous handler, and assume other hardware IRQ routines will as well, that the last in chain will be a BIOS routine that performs a PIC EOI? ie. Is it save to remove my EOI processing in favor of a chain to the previous handler at the end of my ISR?

Yes. For all hardware IRQ vectors, the BIOS installs a routine that does an EOI, even if it doesn't know of any hardware that uses that IRQ. It has to, because if it didn't and some unknown piece of hardware did an IRQ for which no corresponding EOI was received, the CPU would receive no more IRQs from that device. Worse, it would also receive no IRQs from any lower priority device.

eeguru · Jun 15, 2019

Just revisited this today... I know I have serious project ADD.

After some playing around, I dropped into a result case where the timer counts were progressing slowly but still incrementing. Which is weird - the hardware didn't change and the timer counts pace correctly with DJGPP and a PM handler. This goes back to reengnie's great observation of the stack segment overrides and Watcom's scheme of SS relativity (for lack of a better description) for near references. As he pointed out this works when 100% of the execution is in the Watcom generated context but breaks in an ISR context when the interrupt could be in anyone's back yard. Just changing all references to global variables to far references fixed most of the crazyness.

It just seems **WEIRD**. Most code dating back to PC BIOS will do something rational like ds = cs or just making an assume that this has already been done for small/tiny models. Forget the 'it's the rational thing to do' argument for a second. It saves a tone of bytes in instruction encoding to use default segments (like data segment for data!).

While a lot of ghosts were busted with these types of code changes, I've lost a lot of confidence in Watcom 16-bit. I can only assume the 32-bit generation is as solid as it was in the 90s but if PM was an option for this project, I'd prefer DJGPP. I really don't want to start running TCC.EXE in wine_console to do 16-bit RM development in a Linux environment

But it looks like I have little choice.

Thanks again for all the help!

daver2 · Jun 15, 2019

Can you post the bit of code where count1 is declared that the HandleTimer1 function uses? I would like to know why it is referring to it via SS.

Just wondering about how it has been declared, as to whether the compiler has been 'told' that count1 is actually stored on the stack (so it will have to refer to it via SS) but it is not smart enough to realise it can't use that mode for an ISR because the stack is no longer valid.

Been a while, so I am just trying to reacquaint myself with the posts...

Dave

16-bit Watcom C and memory models

eeguru

Veteran Member

mbbrutman

Associate Cat Herder

eeguru

Veteran Member

eeguru

Veteran Member

Chuck(G)

25k Member

daver2

10k Member

eeguru

Veteran Member

eeguru

Veteran Member

reenigne

Veteran Member

daver2

10k Member

reenigne

Veteran Member

daver2

10k Member

eeguru

Veteran Member

reenigne

Veteran Member

reenigne

Veteran Member

eeguru

Veteran Member

reenigne

Veteran Member

eeguru

Veteran Member

daver2

10k Member