How are MFM/RLL defect lists supposed to work?

Trixter · Jan 17, 2012

After retrieving data off of a 42MB Miniscribe, I decided to low-level format it. I used my DOS's hard drive utility, combined with the printed defect list on the top of the drive, to do this. However, things didn't go the way I expected them to go.

It was my understanding that once you entered the defect list into the utility and performed the LLF, the drive was then ready for use with the bad cylinders mapped out or something. What actually happened was this, in this order:

Executed menu option: Performed low-level format. Took 18 minutes (normal).
Executed menu option: Enter defect list
Executed menu option: "Format entries in defect list". (Just the defects were "formatted", operation over in 10-15 seconds)
Booted off of DOS disk, ran fdisk, ran format. Format choked on the same cyl/heads that were in the defect list, and marked them as bad.

The part that was unexpected, for me, was format.com choking on the entries in the defect list. I thought the point of the defect list was to somehow prepare the drive such that those entries were never seen/given by/to the operating system (ie. "mapped out" or similar terminology). Were my assumptions incorrect? Was the above sequence normal and expected?

If my assumptions are incorrect, what's the point of the defect list if the OS is just going to run into them and map them out anyway?

Chuck(G) · Jan 17, 2012

That's exactly the way things are supposed to work, Jim.

Entering the defect list causes the LLF routines to reformat the track, adding a special "bad sector" flag is added, so that the controller returns a status 128. Note that this is not the BIOS status code, but the controller's. Status is translated by the BIOS into something DOS can understand.

The point of doing this is so that a sector (or track; some controllers allow the IAM to contain a bad track flag) is never used for anything. Remember that a controller can correct certain "soft" errors if an appropriate ECC is used. While this can help to recover data from an error, you don't want to rely on it for sectors not even written yet.

So yes, your controller's doing exactly what it should. Later controllers can re-map error sectors to a spare on another track.

MikeS · Jan 17, 2012

Trixter said:
... what's the point of the defect list if the OS is just going to run into them and map them out anyway?

It lets you manually mark marginally bad sectors/tracks found at the factory which are likely to cause errors even though less critical test programs and the LL and DOS format utilities might think they're fine.

The LLF marks the tracks/sectors as bad, but then DOS has to mark those sectors (plus any others that it may find) as unavailable in the FATs.

Trixter · Jan 18, 2012

Thank you both for the confirmation; glad to know I wasn't going crazy.

One thing I found pretty damn neat about the defect list is that it not only lists the cylinder and head of a defect, but also the BYTE that has the problem. That's some pretty impressive equipment they must have had at the factory!

Chuck(G) · Jan 18, 2012

How so? Since they don't know how you're going to format the drive, what's better than the byte (or bit) offset from index?

pearce_jj · Jan 19, 2012

Some controllers did map them out, ST11 for example. But marking them bad should enabled DOS to mark them bad in format with no delay.

Great Hierophant · Jan 19, 2012

Could a bad sector cause this kind of problem :

Say you have a 10MB drive with DOS on it and are trying to copy a 6MB file onto it. There is a defective sector somewhere in the first 6MB of the drive. Can the file be designated to use non-contiguous sectors? Otherwise you could not copy the file to that drive.

Trixter · Jan 19, 2012

But marking them bad should enabled DOS to mark them bad in format with no delay.

Oh, there was a delay -- a very noisy delay. It reacted just like any normal bad sector; head recalibrate three times, three attempts. It only startled me because I wasn't expecting it, hence the question that started this thread.

Chuck(G) · Jan 19, 2012

Smart controller BIOSes know what to do when they run across one of these. Unfortunately, not all hard disk BIOSes are very smart.

There were a couple of BIOS add-ons that would divert bad sectors to known-good areas. But then, hard drives got smarter, so there was no need for BIOS intervention and the add-ons vanished. Similarly, BIOSes no longer had to smart about the "bad sector" issue.

What puzzled me about the "format and assume good thereafter" approach of DOS was that it makes little sense. Flaws can spread on a track or be pattern-sensitive (which is why it's a good idea to flaw an entire track when it contains a bad sector), so that cluster that looked fine when you formatted the drive is no longer competent to store data. What would make more sense is to write and then read back verify the first time a sector is written. If it's bad, mark it so, then reallocate a new cluster to hold the data and repeat.

But DOS was very fixed in its approach.

MikeS · Jan 19, 2012

Great Hierophant said:
Could a bad sector cause this kind of problem :

Say you have a 10MB drive with DOS on it and are trying to copy a 6MB file onto it. There is a defective sector somewhere in the first 6MB of the drive. Can the file be designated to use non-contiguous sectors? Otherwise you could not copy the file to that drive.

Normally that's not a problem and DOS will just skip over the bad sector (block, actually); after a while most large active files will become fragmented anyway which is why you should run a defrag program from time to time to put all the file fragments back together contiguously and avoid the time spent jumping all over the disk. I kinda enjoy watching defrag in action, and of course it also shows you the bad blocks.

The one place where a bad sector can cause problems is if you're creating an exact copy of a disk from an image.

Trixter · Jan 19, 2012

Chuck(G) said:
What puzzled me about the "format and assume good thereafter" approach of DOS was that it makes little sense. Flaws can spread on a track or be pattern-sensitive (which is why it's a good idea to flaw an entire track when it contains a bad sector), so that cluster that looked fine when you formatted the drive is no longer competent to store data. What would make more sense is to write and then read back verify the first time a sector is written. If it's bad, mark it so, then reallocate a new cluster to hold the data and repeat.

I think the obvious answer was the speed tradeoff involved in doing so, and maybe they were trying to keep the memory footprint down (such code, plus error reporting, would have been at least another 1K). Or, maybe they simply didn't think about it, but I find that depressing to think about.

A partial workaround in DOS 2.x and later is to turn on VERIFY, ie.:

Code:

VERIFY=ON

...which will then read back everything it writes (same as the /V switch in COPY). It won't be smart about marking bad clusters or anything, but at least you have immediate confirmation if your data was written properly or not.

Chuck(G) · Jan 19, 2012

VERIFY=ON got to be irrelevant pretty early on when IDE and SCSI drives came with with their internal caches. When you go to verify on these drives, you actually read the cache and really don't have a clue as to what actually made it to the drive. I've seen the futility of this when writing floppies on Windows NT and later disks. VERIFY simply is ignored as far as I can tell.

I've often wondered why hard drive manufacturers never followed the lead of tape drives (DDS, DLT, etc.) where a read-after-write is performed automatically by a specially-constructed head and data automatically rewritten if a verify fails. I suspect the added complication in head structure and the additional electronics needed might have something to do with it. After all, RAID is in and the "ID" stands for "inexpensive drives". Still, there could be a command that enables a mode that says "if you write to this cylinder, waste a rev and verify the track before moving off-cylinder".

vwestlife · Jan 20, 2012

So what exactly is the disadvantage of not entering the defect list when doing the low-level format, and just letting DOS FORMAT catch and flag off the bad sectors?

Maybe I'm just not using the really fancy ones which attempt to remap the bad tracks, but I've never seen an MFM/RLL controller make the as-entered bad tracks "blind to the user" -- both the low-level format and DOS FORMAT attempt to format them anyway, and of course run into errors when doing so. That makes it seem like a waste of time to type in the defect list, if it's apparently just going to be ignored anyway?

Chuck(G) · Jan 20, 2012

The problem is that FORMAT doesn't take into account pattern-sensitivity, nor "correctable errors". So a drive can pass DOS FORMAT with flying colors and then go kablooie after you've written important data to the drive. In other words, with "soft" and "correctable" errors, some data patterns won't show a problem, but others will. You really do want to remove sectors with problems exposed by surface analysis permanently from consideration by DOS.

DOS FORMAT is very simple-minded in that respect.

vwestlife · Jan 20, 2012

Chuck(G) said:
DOS FORMAT is very simple-minded in that respect.

Yes, that's why I follow it up with the surface scan of Norton Disk Doctor or ScanDisk.

I'm at least glad to have a totally error-free ST-225 in my 5150... it has no entries on the factory defect list, and no bad sectors detected by any software. And hopefully it'll stay that way for a good long time!

Trixter · Jan 20, 2012

I like referring to this post, which is not only justified criticism of Spinrite, but sheds some light on why the defect lists are so awesome (they are detected with much more sensitive equipment than the consumer could ever have): http://groups.google.com/group/comp.dcom.xdsl/msg/9aeee32323c2978e?dmode=source&hl=en

Specifically this bit:

Worse, Steve encouraged people to use SpinRite to "recover" areas that had been detected and marked as defective at the factory, a bad idea that leads to more failures in the long run, since end user controllers are not as sensitive as factory test equipment -- they are simply incapable of the kind of thorough testing done at the factory. Then of course SpinRite would be "needed" again to "fix" those failures, a self-fulfilling prophecy.

Since my eyes were opened, I only use spinrite for two purposes now:

Scan a newly-acquired drive for errors and mark out bad clusters so that I don't get any errors when I archive the drive
Perform the low-level format interleave test to determine what interleave is best once I decide to LLF it myself

...but that's it. I never even think of reclaiming bad areas on a drive.

MikeS · Jan 20, 2012

vwestlife said:
Yes, that's why I follow it up with the surface scan of Norton Disk Doctor or ScanDisk.

They may be a little better at handling errors but since what they see is 'filtered' through the controller they're not going to be much better at finding errors than DOS is, and certainly not as good as the factory's specialized equipment. Ignore factory defects at your own risk; chances are very good that they'll bite you at the worst moment.

DOS's FORMAT is not really 're-testing' the areas marked bad, i.e. if they're marked bad they will always be bad even if DOS would have found them OK otherwise.

vwestlife · Jan 20, 2012

MikeS said:
DOS's FORMAT is not really 're-testing' the areas marked bad, i.e. if they're marked bad they will always be bad even if DOS would have found them OK otherwise.

But what would happen if I enter a track into the LLF defect list which is not actually bad? Will FORMAT be forced to declare it as a bad sector, even though it's actually good?

Trixter · Jan 20, 2012

Yes. Why you would want to do this, I'm not sure

but yes. Chuck mentioned this in post #2: http://www.vintage-computer.com/vcf...ct-lists-supposed-to-work&p=207637#post207637

MikeS · Jan 20, 2012

vwestlife said:
But what would happen if I enter a track into the LLF defect list which is not actually bad? Will FORMAT be forced to declare it as a bad sector, even though it's actually good?

Exactly! That's pretty well the point of this whole thread, that DOS essentially 'takes the word of' the LLF routine where the sector/track was marked as bad, on the assumption that the factory is much better equipped to find marginal spots on the disk than your computer which can only know whatever the disk controller can find and pass on, or that you had actually found an intermittent error yourself that you want to permanently lock out (and effectively add to the list).

How are MFM/RLL defect lists supposed to work?

Veteran Member

25k Member

Veteran Member

Veteran Member

25k Member

Veteran Member

Veteran Member

Veteran Member

25k Member

Veteran Member

Veteran Member

25k Member

Veteran Member

25k Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member