sergey
Veteran Member
Recently I've got this bug report on the 8088 BIOS. I'll describe the debugging procedure and other interesting findings for everyone's enjoyment.
The bug reporter's observations were that SysChk utility hangs on a system with 8088 BIOS, but it works OK on the same system with GLaBIOS, so it does appear to be BIOS related.
Symptoms seem to be related to the UART/Serial port and the interrupt controller (PIC).
The bug reporter's observations were that SysChk utility hangs on a system with 8088 BIOS, but it works OK on the same system with GLaBIOS, so it does appear to be BIOS related.
Symptoms seem to be related to the UART/Serial port and the interrupt controller (PIC).
- I compared the PIC and UART initialization between my implementation, GLaBIOS and the original IBM XT BIOS, and did not find anything related... While the initialization procedures are slightly different, they result in the same configuration. I was trying to find an IBM XT emulator that would allow some degree of logging I/O port accesses to better understand any differences, but I couldn't find anything suitable without investing too much time... DOSbox is not one of these emulators if anything...
- I decided to determine where in code the SysChk was hanging... Easier said than done. Once it is hanging, DOS DEBUG is not able to interrupt it. So I recalled that many years ago I've used the NMI for debugging similar situation. I implemented an NMI handler that prints the registers and the code around the current IP location. I ran the SysChk, waited until it hung, and generated an NMI using a piece of wire to connect ISA A1 (/IOCHCHK) to B1 (GND) pins... I know, I could have used a small flathead screwdriver instead...
- The hang was happening in what appears to be an interrupt service routine for one of the IRQs (later I found that it was IRQ3). The code was pretty short, it and it wasn't a big problem to disassemble and understand it. At the same time it wasn't clear why that code would hang with my BIOS, but wouldn't hang with the other BIOS... I spent an hour unsuccessfully trying to understand what's the deal.
- I had a thought of disassembling SysChk, but it appeared to be a compressed SFX binary, and I didn't want to spend time trying to find decompressor for it.
- Finally I went through the process of running SysChk under DOS DEBUG, stepping over the "CALL" instructions, trying to narrow down to the place were it was hanging
- Note on DEBUG commads: p - "proceed" is the command to step over the CALL instructions, vs. t - "trace" - stepping into CALL
- Fairly quickly I found that the IRQ detection procedure that was causing the hang, didn't run at all on GLaBIOS... It was a variable set earlier that SysChk was checking to determine whether to run IRQ detection
- So I had to do another series of runs under DOS DEBUG to find where and how this variable was being set
- It turns out that SysChk calls INT 15h AH=0C0h (get system configuration parameters), and it appears to check for the bit #1 of the feature information byte, which would be set on an MCA system. SysChk when will skip the check on the MCA system (it is either not needed, or doesn't work there?!). Now, my BIOS does implement that function, and returns the correct data (non-MCA system), while GLaBIOS does not implement that function, and returns CF=1 AH=86h, as it should if the function is not implemented. Now, SysChk does not check if the function call was successful. It simply uses the ES:BX value, and assumes that the system configuration parameters structure will be there.
- It happens so that initial ES:BX value is 0000:0000, so SysChk goes and checks byte at 0000:0005, which is actually a part of INT 1 vector, and it happens so that bit #1 is set there... (BTW, apparently DOS implements its own INT 1 ISR, and perhaps most DOS versions have similar value there?!). And since that bit is set, it presumes that the system is MCA, and skips the IRQ check
- Now it is a good question, why exactly IRQ check would hang. It appears to be a combination of two bugs:
- Hardware bug: on a typical IBM PC/XT as well on Micro 8088, IRQ signals are left floating (mental note to self - put some pull-downs there next time). Perhaps it doesn't cause much trouble with older NMOS 8259, but with the CMOS chipset it seems that PIC reads floating IRQ signals as switching between 0 and 1 all the time. Normally, this shouldn't cause issues as all unused interrupts are masked at the PIC
- Software bug: SysChk implements its own ISRs for IRQ3, IRQ4, IRQ5, IRQ7 (presumably all IRQs that COM or LPT ports can use), and unmasks these interrupts. I assume, the idea is that then, it tries to trigger an interrupt using a COM or an LPT port, and checks what is the IRQ level for it. Instead, due to the floating IRQs, it results in an interrupt storm, and possibly in a stack overflow and a hang...
- The bug's reporter eventually implemented pull-downs on IRQs and that resolved the issue for him...