• Please review our updated Terms and Rules here

Weird 386sx Problem - VGA interference + divide by zero errors

chadbob2600

New Member
Joined
Jan 22, 2024
Messages
7
This is a Packard Bell 386sx 25MHz motherboard with integrated Oak VGA chipset. It has no battery damage.

Symptoms:

  • VGA noise/distortions that change with computer activity (ISA bus activity or floppy activity)
  • Smart Drive and Setver freeze PC and give divide by zero errors
What I've Tried:
  • Powered with known good modern ATX power supply -- No change
  • Ran 5 passes of memtest86+ 2.1.1 -- No failures
  • Inspected VCC/VDD pins of VGA chipset, Video DRAM, and System DRAM. Ripple/noise looks normal to me? Measured < 50mV (see scope pics)
  • Inspected pixel clock output of RAMDAC (OTI-066) and it looks rock solid to me. (see scope pics)
  • Removed 387 and expansion simms seen in pics (has 2MB on board) -- No change
Here's a short video of the VGA noise while sitting in the BIOS screen (One Drive link): https://1drv.ms/v/s!Aq_S4PNq1opGgptjeUk9OZsPYfTs6A?e=Q9NJCp

I'm not sure where to go next.

IMG_1615.jpgimage_50793473.JPGimage_67207425 (1).JPGIMG_1602.jpgIMG_1610.jpg
 
Last edited:
I'm not sure where to go next.
Welcome to these forums.

Some thoughts:
- Motherboard (or some chip/s) being clocked higher than certified for? (Via a jumper setting?) (Maybe restore CMOS SETUP to factory defaults.)
- Maybe some chip deterioration that will no longer manifest if you clock the motherboard slower than what it is certified for?
 
This motherboard has no ability to modify the clock speeds. (Short of replacing crystals).
 
Could you disable the onboard VGA and see what a plug-in VGA board does?

Disabled the onboard VGA and tried a Trident ISA VGA card. No picture problems with this card. Divide by zero errors still present.

I had assumed the electrical noise issue and the system stability issue were related, but maybe not.
 
I see there is a jumper for telling if the 386 is pipelined or not. Have you tried toggling that?
This really feels like a speed or timing related error. It feel like anything that's touching memory or the CPU cache pushes it into the weeds.
 
"divide by zero" is an error you normally get when code runs too fast. This error happens inside the CPU itself. If it is not overclocked, I would highly point to the CPU being faulty. Yes, that happens rarely, but it does happen. While external components (mainly RAM and Cache) can cause that as well by corrupting code, it would be irreproducible and random - and you checked the RAM already. So it seems the CPU is toast.
 
The distortion is going to be noise in the analog part of the VGA circuit, the part between the DAC and the monitor, including wherever the sync signals are generated from.

"code runs too fast" is a terrible take, IMO. if it passes memtest86 I think you can reasonably conclude there's no problem with the CPU or memory. there may be a hardware problem elsewhere, but I'd start by testing the assumption that whatever is on the hard drive isn't damaged. boot a known good DOS floppy and try that. I don't think setver or smartdrv exist in executable form on the MS-DOS 6.22 install disks (they'll be compressed) so if it's specifically setver and smartdrv you're concerned about, you'll want to prepare a floppy disk with known-good copies of those files.
 
In many cases, "Divide by zero" doesn't really mean that a divide has occurred. Software interrupt 0 is produced by a divide exception, but it is by no means the only way the CPU can produce that. It can be something as simple as corrupted or buggy code.

I'm assuming that you're booting a fresh copy of DOS from floppy...
 
The distortion is going to be noise in the analog part of the VGA circuit, the part between the DAC and the monitor, including wherever the sync signals are generated from.

"code runs too fast" is a terrible take, IMO. if it passes memtest86 I think you can reasonably conclude there's no problem with the CPU or memory. there may be a hardware problem elsewhere, but I'd start by testing the assumption that whatever is on the hard drive isn't damaged. boot a known good DOS floppy and try that. I don't think setver or smartdrv exist in executable form on the MS-DOS 6.22 install disks (they'll be compressed) so if it's specifically setver and smartdrv you're concerned about, you'll want to prepare a floppy disk with known-good copies of those files.

In many cases, "Divide by zero" doesn't really mean that a divide has occurred. Software interrupt 0 is produced by a divide exception, but it is by no means the only way the CPU can produce that. It can be something as simple as corrupted or buggy code.

I'm assuming that you're booting a fresh copy of DOS from floppy...


This is a custom DOS 6.22 boot disk image I made from original DOS 6.22 sources. Setver and Smart Drive work fine from this floppy on other computers.
 
You can write a little TSR that hooks INT 0 and displays the registers, PC and a few instructions around CS:IP. I may even have such a thing in my code hellbox from years ago.

What happens if you run DEBUG SETVER.EXE and then do a "g" at the prompt?
 
"code runs too fast" is a terrible take, IMO. if it passes memtest86 I think you can reasonably conclude there's no problem with the CPU or memory.
memtest does just that: it tests memory. This does not tell at all whether or not the CPU is ok. I had CPUs in the past that would crash when trying to enter protected mode. Also had one that could no longer drive the A20 line. Both seemed fine otherwise and were "working". Very subtle faults can emerge in a CPU by age, defect by factory, or ESD damage. I don't see the point of ignoring that the CPU might be bad just because memtest works...

Also, you got that "code runs too fast" wrong, obviously. I only said it's the most common cause for that error message (users of Turbo Pascal probably know), and may also emerge when overclocking. The point was that this is an error that happens inside the CPU.
 
Last edited:
So the weather changed here, and the temp in my garage dropped by almost 30 degrees. Guess what, the machine is working fine now.

I'm going to go over the board while it's cold with the fine tip on my hot air rework station and see if I can find the bad component.
 
In addition to "CPU runs too fast" I would also add "check the clock signal".

I've had a 486 board with a bad clock chip, causing the clock signal to be unstable and randomly spike - benchmark numbers were all over the place and the system was very unstable.
 
I didn't even think about a temperature sensitivity with this. Sometimes a chance occurrence is worth a week of troubleshooting.

And if the hot air is inconclusive you can always bring it back to room temperature and using freeze spray.
 
Just to give an update on this thread...

The only heat related failure I could find was the Motorola RTC. After heating it up the mobo would lose the CMOS settings and complain about CMOS checksum failures. So I replaced it with one off a parts board.

I noticed the mobo had the same brand caps as were in the power supply. I had to recap the power supply to bring it back to life, so I figured I'd do the mobo, too.

Replaced all the electrolytics with Kemet Polymer low-ish ESR caps. None of the caps I pulled tested bad but the ESR was about 3-4 ohms at 1KHz. The Kemet caps were around 40milliohms. The +5V ripple (avg measured on VCC pin of multiple ICs) went from around 50mV to 14mV.

Now the board seems to be stable and the video signal is much cleaner. So I'm not 100% sure what fixed it.

Another thing I noticed, the old Moto RTC pulled 160 micro amps from the battery when the machine was off. That's a little high according to the datasheet. The replacement pulls 100 micro amps. Maybe that implies it had some bad silicon inside with excessive leakage? Who knows.
 
Last edited:
Back
Top