• Please review our updated Terms and Rules here

Turbo XT that suffers filesystem corruption when in 'turbo' mode (only)

kdr

Experienced Member
Joined
Sep 9, 2020
Messages
118
Location
Wellington, New Zealand
So this is a real head-scratcher...

I recently acquired a Logicraft 88-XT machine, which is a generic Turbo XT system with 640K of RAM and an NEC V20 processor (in a socket labelled "8088-2") that runs at either 4.77Mhz or 8Mhz. I can't find schematics or a manual for this machine, but the turbo switch appears to select between a 14.318180Mhz crystal and a 24.0Mhz crystal. [Not an XCO.]

I dropped in a known-good ST11R controller paired with a known-good ST-238R drive, did a brand new low-level format, installed MS-DOS 3.30, and everything was going fine until at some point I edited C:\CONFIG.SYS while in turbo mode. No errors or anything until after rebooting, when the system just failed to boot from the hard drive at all. CHKDSK from a MS-DOS 3.30 boot floppy spewed forth hundreds of errors, entire allocation chains were lost and IO.SYS / MSDOS.SYS were truncated, filenames were randomly corrupted, etc.

Twice more I formatted and reinstalled DOS. Both times everything was working perfectly, with lots of disk reading and writing and copying of files, until I would absentmindedly reboot and forget to switch out of turbo mode (the default). At which point the root directory and the FAT would get corrupted again.

Occasionally, even just reading from the disk in turbo mode would return incorrect data. Confirmed through the use of MD5SUM.EXE -- in normal mode the checksum was always correct, but in turbo mode the checksum was always wrong -- and each run through MD5SUM would return a different (but still incorrect) checksum. I created a RAM disk (RAMDRIVE.SYS) and confirmed that even in turbo mode, file checksums from the RAM disk were correct.

Finally, and most perplexingly... Norton Disk Doctor 4.5 and SpeedStor 6.03 are both returning perfect results... even in turbo mode! I am about an hour into torturing the disk with Spinrite II's most aggressive pattern test, again in turbo mode, and again without observing even a single error yet.

Any suggestions on what to try next?

Is it possible there is some marginal RAM located exactly where DOS puts its disk buffers? [But if there is bad RAM, the RAM disk test ought to have failed as well, right?]

Perhaps Disk Doctor and friends are bypassing the BIOS routines and submitting commands directly to the ST11R controller? And therefore could it be some mysterious timing-related bug in MS-DOS 3.30 or in the BIOS?
 
More testing is needed. For example, you're seeing this corruption when updating the hard drive; do you see it updating the floppy drive too? If so, I'd say you have RAM that isn't fast enough to work properly at the turbo speed. I'd also run a RAM test at both slow and fast speeds.
 
This simple disk performance test also has a "mediatest" function that I've found useful; it's a reliable tell for CF cards that have compatibility issues with my XTIDE/XT CF adapaters. It might be worth trying it, since it'll operate entirely through the DOS/BIOS interface. See if it reliably fouls the drive at turbo speeds but doesn't at standard?

Turbo XTs are all over the map with regard to how cleanly they implement their higher clock speed, and XT hard disk controllers like the ST11R rely on DMA, which makes them pretty much the worst case scenario for causing problems if anything's at all sketchy.
 
Controller cant take the higher bus speed, surely ?

There's a manual for the ST11R on Bitsavers that says what the minimal requirements are for bus pulse timings, you could probably do some math to try to work out if 8mhz is still within range. But there still might be devils in the details, like duty cycle, with regards to exactly how the particular clone implements the higher speed. There was a thread recently about how turbo XTs were sometimes "sloppy" about ensuring other peripherals were clocked correctly when turbo was engaged, it totally could be some kind of hardware or BIOS bug with DMA that only crops up in fast mode.

Might be safer to just toss an XTIDE or XT-CF card into this one?
 
This simple disk performance test also has a "mediatest" function that I've found useful; it's a reliable tell for CF cards that have compatibility issues with my XTIDE/XT CF adapaters. It might be worth trying it, since it'll operate entirely through the DOS/BIOS interface. See if it reliably fouls the drive at turbo speeds but doesn't at standard?

Turbo XTs are all over the map with regard to how cleanly they implement their higher clock speed, and XT hard disk controllers like the ST11R rely on DMA, which makes them pretty much the worst case scenario for causing problems if anything's at all sketchy.

Excellent, that DISKTEST program you linked to looks like just the ticket.

I think your suggestion that it's a DMA issue is a good hunch, and something I will investigate further. I get the feeling that advanced low-level tools like SpeedStor and SpinRite are bypassing much of the 'normal' DOS/BIOS interface.

Also thanks to everyone else for the suggestions so far! What's a good RAM testing program for an XT class machine? It only has the standard 640KB (plus 32KB at B000 for the monochrome display adapter) and want something which is able to relocate itself so that it can test each and every address... especially critical to check the first ~16KB where DOS holds all of its buffers and pointers and other control information.
 
Check-it 3.0 does a pretty comprehensive test (if you specifically select the extended test, the quick version probably isn't particularly definitive), and some of its other diagnostics might possibly come in handy.
 
I've done further testing, and here's where I am at:

(1) I ran CheckIt 2.1 (which fits on a 360K floppy) in turbo mode and all system tests (including RAM pattern tests) passed.
(2) DISKTEST.EXE is able to trigger the data corruption issue in turbo mode, even when reading/writing to a floppy disk.
(3) Pulled all cards from the system and ran with nothing except a known-good Hercules clone and a known-good floppy controller, and DISKTEST is still failing when in turbo mode.

It's still very strange that Norton Disk Doctor, SpeedStor, and SpinRite all passed with flying colours even in turbo mode.

So it must be something to do with the DMA controller or the ISA bus interface logic, yeah?

I measured the CLK input to the NEC D8237AC-5 DMA controller chip on the motherboard using my el cheapo multimeter's frequency counter. In normal mode it measures a 4.771Mhz CLK input with a 54% duty cycle; in turbo mode it measures a 3.999Mhz CLK input with a 21% duty cycle. The datasheet specifies a minimum CLK period of 200ns (=5.00Mhz) with a minimum CLK high of 80ns and a minimum CLK low of 68ns. If the turbo mode clock input to the DMA controller really has a 21% duty cycle, the clock pulse might be as narrow as ~62.5ns... but do I trust my multimeter's measurement? [Alas I don't have access to any better test equipment and this is already way out of my depth!]

I've been working from the assumption that this machine must have worked just fine in turbo mode at some point in the past. Is that a valid assumption to make here? I'm aware that some turbo XT systems would drop out of turbo mode during the BIOS disk access routines. Perhaps this machine used to do that, and something has changed to cause the BIOS to no longer switch out of turbo automatically?

I guess my next step is to try and write a minimal test case that demonstrates the problem and go from there.... as always, thanks for the suggestions so far!
 
Also, I haven't been able to find much at all about this machine online. Perhaps some photos will help someone to recognize it?

IIMG_20200926_155316.jpg
IIMG_20200926_155336_1.jpg
IIMG_20200926_155353.jpg

The motherboard says "LOGI PC88XT" (obscured by the ISA cards).

Unfortunately the hard drive in the machine refuses to spin up when power is applied, i.e. totally dead, so no chance of finding useful drivers or information via that avenue...
 
Thank you very much for the link to DISKTEST !
My breadboard 8088 computer tends to corrupt its filesystem a lot, that program will definitely help me debug the issue.
 
Assuming that nobody tried to modify or overclock the board. I would think that bad decoupling caps could also create problems with these old DRAM chips. If you they dry or leak (which is likely in 20 or 30 year old boards), they can create problems like signals not meeting their requiered noise margins and therefore create memory issues.
 
I've seen old DRAM that will test good on my Chroma tester, but only at access times *longer* than their rated speed. I.e. sometimes the old DRAM can fail outright, but other times it just gets "slower"
 
This week I have acquired a second Turbo XT machine that, through sheer luck, has a motherboard which is virtually identical to the one in my "problem" machine. (It's a 4.77Mhz/10Mhz board instead of a 4.77Mhz/8Mhz board.)

And I can now confirm that all of my ISA cards work perfectly fine in the new machine (at 10Mhz) and that I can't reproduce any filesystem corruption (on either floppy or hard disk) on that machine, despite using the exact same testing programs and the exact same drives.

I also ruled out a problem with the ROM BIOS, because [thanks to the motherboards being basically identical] it was possible to boot the problem machine using the BIOS chip from the known-good machine and yet still reproduce the filesystem corruption.

So I'm convinced that it is a hardware issue causing the corruption. Does anyone have suggestions or ideas as to what kind of hardware failure might cause MS-DOS to experience read/write errors and yet allow diagnostic tools like SpinRite to execute without error?
 
I'm still keen to hear suggestions on what might be causing this issue.

I have managed to create a very small test case that exercises the problem. It just calls INT 13h in a tight loop to write sectors to a floppy and then read them back and check for correctness. It runs very reliably at 4.77Mhz (as expected). It exhibits errors at 8Mhz (again as expected).

The errors are quite frequent if I ask INT 13h to write multiple sectors (e.g. 4+ sectors in a single request) and the errors are very infrequent when writing single sectors. If I modify the test case to write the sectors once, and then read them back in a loop, there are no errors even at 8Mhz. So the corruption is only happening when writing sectors.

What's most curious is the nature of the corruption: it's always a *single* byte that gets corrupted, and it's *always* corrupted to 0x20. Every single time that the corruption strikes, it randomly changes one single byte of the sector to 0x20.

Again, I want to stress that the CPU and memory are 100% okay at 8Mhz. I can calculate MD5 hashes of files in a ram disk all day long in turbo mode with zero errors. And direct access to the disk controllers that bypass the BIOS [such as SpinRite] also work 100% of the time with zero errors.

Where should I look next? Is it time to fire up Turbo Debugger and single step my way through the BIOS INT 13h handler?
 
I'm still keen to hear suggestions on what might be causing this issue.

I have managed to create a very small test case that exercises the problem. It just calls INT 13h in a tight loop to write sectors to a floppy and then read them back and check for correctness. It runs very reliably at 4.77Mhz (as expected). It exhibits errors at 8Mhz (again as expected).

The errors are quite frequent if I ask INT 13h to write multiple sectors (e.g. 4+ sectors in a single request) and the errors are very infrequent when writing single sectors. If I modify the test case to write the sectors once, and then read them back in a loop, there are no errors even at 8Mhz. So the corruption is only happening when writing sectors.

What's most curious is the nature of the corruption: it's always a *single* byte that gets corrupted, and it's *always* corrupted to 0x20. Every single time that the corruption strikes, it randomly changes one single byte of the sector to 0x20.

Again, I want to stress that the CPU and memory are 100% okay at 8Mhz. I can calculate MD5 hashes of files in a ram disk all day long in turbo mode with zero errors. And direct access to the disk controllers that bypass the BIOS [such as SpinRite] also work 100% of the time with zero errors.

Where should I look next? Is it time to fire up Turbo Debugger and single step my way through the BIOS INT 13h handler?
 
The only thing common to all the scenarios you presented that I can think of is either memory, DMA, or the NEC765 controller. Since you tested the memory successfully, DMA and the NEC765 are what's left. If they're both clocked by the motherboard crystal, then one of them isn't rated at higher speeds.

If your floppy controller is on a card, and you moved it to the other system and everything worked fine, that leaves the DMA controller. Maybe try replacing it... or just make a note that the board is only stable at 4.77 MHz and leave it at that.

These kinds of wonky problems were common in the mid- to late-1980s when the market was flooded with cheap boards, btw.
 
I have plenty of other XT machines to play with, so it's really more about the troubleshooting process at this point. It's such a weird glitch! I would like to be able to identify the specific flaw which is causing this. Especially because this machine *always* boots up in turbo mode, and I've already destroyed three FAT tables by forgetting to disable the turbo every. single. time. it. boots. :)

Thanks again for the suggestions. It's nice to have a second set of eyes look at an issue.

The DMA controller is clocked at 4.77Mhz in normal mode and 4.00Mhz (8.00Mhz CPU CLK/2) in turbo mode.

Every floppy controller I try in the system exhibits the corruption. Including the FDC on the super I/O card that was originally supplied with the system by the manufacturer. The BIOS also identifies itself with the name of the system ("LOGI PC-88XT") so it surely must be the original BIOS supplied by the manufacturer.

The filesystem corruption in turbo mode is as reliable as clockwork, and I refuse to believe that the machine exhibited this issue when it was originally sold. It would have been noticed within hours of installing MS-DOS on the hard drive.

I'm slowly single stepping my way through the BIOS INT 13h handler. Nothing out of the ordinary yet. Seems that the Phoenix BIOS in this machine is a very good replica of the original IBM XT BIOS. I doubt there are any bugs lurking here. I suppose the next step is a small program to directly exercise the 8237A.
 
One thing to check - make sure any socketed chips are actually rated for the higher speeds. I can imagine someone fiddling with a machine like that and swapping around chips from a 4.77mhz board. Then they see it boots, so it must work, right?
 
If the DMA controller is really clocked at 4mhz in Turbo mode I wonder if the machine has some hackish circuitry to slow the system down to the DMA clock when anything related to it is asserted but something that relied on an analog delay to trigger properly has stopped working because of age. (Bad capacitor or other usual suspect.) As a result when the system is "throttling" up and down you're getting an ugly clock waveform or too-short duty cycle somewhere.

If you just want to use the system for something a simple fix would be to throw an XT-CF card in it, one of the really brain-dead 8-bit ones. They *only* use polled I/O, don't touch the DMA controller at all. (I built them into cards for my Tandy 1000 EX and HX because, well, those machines don't even *have* DMA controllers.)

If the machine has a physical turbo switch to change speeds you could whip up a simple circuit that'd put the machine in 4.77mhz mode when the floppy drive motors are on... ;)
 
If the machine has a physical turbo switch to change speeds you could whip up a simple circuit that'd put the machine in 4.77mhz mode when the floppy drive motors are on... ;)

You laugh, but many 8086 and 286 BIOSes did this on purpose, quite intentionally, to allow speed-sensitive copy-protection schemes to work.
 
Back
Top