• Please review our updated Terms and Rules here

Fixing bad sectors caused by a crash during hard drive write

cr1901

Veteran Member
Joined
Dec 28, 2011
Messages
817
Location
NJ
Problem Statement
Yesterday, I ran CHECKIT on my PC AT hard disk[SUP]1[/SUP] running DOS 5.0 and got a pleasant surprise:

Code:
=== HARD DISK 0 (C:) ===

Test Configuration:
Cylinders: 614
Heads: 4
Sectors: 17
Capacity: 21,377,024 Bytes

Test Results:
Controller Diagnostic Test..................................Passed
Linear Read.............................................. ...FAILED ***
Butterfly Read.............................................. Aborted

Cyl Head Severity Test Notes
---- ---- -------- ----------- -----------------------------------

139 2 MAJOR Linear DOS File. Drive C:.
139 3 MAJOR Linear DOS File. Drive C:.
140 3 MAJOR Linear DOS File. Drive C:.
157 1 MAJOR Linear DOS File. Drive C:.
334 2 OK Linear Marked by Low Level Format.
565 1 OK Linear Marked by Low Level Format.


*** END TESTS: 4 ERRORS ENCOUNTERED ***

I'd never had trouble with this hard disk until recently; track 71 occasionally has a soft ECC error, but it disappears. It's definitely possible the drive is on its way out after 37 years. But I still am kinda attached to the drive, seeing it's the original one of the few notorious CMI drives still working.

Debugging
I don't know what "DOS File." error means, so I decided to dig deeper. Here's what I found via running a bunch of int 0x13 reads in DEBUG and looking at the error code:
  • 139, 2, 15- Cannot find address mark
  • 139, 2, 16- Cannot find address mark
  • 139, 2, 17- CRC Error
  • 139, 3, 1- CRC Error
  • 157,1,4- CRC Error
  • 157,1,5- Cannot find address mark
  • 157,1,6- CRC Error
  • 157,1,7- Cannot find address mark
I can prob fix the CRC Error by zeroing out the sectors, but "cannot find address mark" would require me to low-level format the track.

Next, I tried to figure out which file these sectors belonged to. Based on the BPB[SUP]2[/SUP], I was able to figure out that CHS 139, 2, 15 is the start of FAT cluster 0x928, and CHS 157,1,4 is the start of cluster 0xa53. Each cluster is 4 sectors, and I confirmed that the sectors adjacent to these clusters read fine. By doing more int 0x13 reads to search the FAT using DEBUG's "s" command, I found the following FAT cluster chain:

Code:
0x24d => 0x24E => 0x25d => 0x928 => 0xa53

The first bad cluster points to the second bad cluster. By reading the good sectors, I inferred the bad file was in "C:\DOS" based on its contents. I then did another DEBUG "s" on C:\DOS's directory entries to find that the bad file was DOSSHELL.SWP. Sure enough (CTTY COM2):

Code:
C:\DOS>type DOSSHELL.SWP

General failure reading drive C
Abort, Retry, Ignore, Fail?r

General failure reading drive C
Abort, Retry, Ignore, Fail?f
Fail on INT 24 - DOSSHELL.SWP

How To Fix?
I recall a crash happening recently (less than a week ago) while I was running DOSSHELL off a floppy, and I don't recall running CHECKIT since then. I don't think these particular sectors being bad is a coincidence. Maybe the crash happened during a multi sector write[SUP]3[/SUP]? Are there any known bugs back in the day w/ DOSSHELL?

I don't care about saving DOSSHELL.SWP. The rest of the drive appears to be working fine for now, and I'll keep an eye on it. So my main question is: what recourse do I have to non-destructively remove both?

As a bonus, are there utilities to find "orphan" FAT chains that don't link to any existing file and display/free them?

As for track 71's soft errors, maybe I need to rewrite it. Are there utilities to rewrite a track safely[SUP]4[/SUP] without a low-level format beforehand[SUP]5[/SUP]?


Footnotes
  1. Yes, it's the original CMI drive, type 6. I've been using it for 10.5 years without issues.
  2. FAT16, 512 bytes/sector, 4 sectors/cluster, 1 reserved sector, 2 FATs, 512 root dir entries, 41840 sectors, FAT ID F8, 41 sectors per FAT
  3. I don't know if an interrupted multi-sector write explains the address mark errors though. I don't think that part of the disk was rewritten after the initial format?
  4. By "safely", I mean maybe back up the good sectors on the track to a floppy before doing the write.
  5. Assuming no power loss. If that happens well, everything's backed up anyway, and I'll do another backup before hand.
 
You are going to a lot of trouble trying to "fix" things manually.

I'd strongly suggest just starting over with a fresh low-level format. Either using the controller BIOS or SpeedStor. If the drive head just wrote junk to the wrong place, a low-level format will fix it. If there is physical damage, then it will be more obvious.

I'd then fdisk and DOS format the drive, fill it with random data such as a large ZIP file (because sector fill is easier for controllers to read than random data), then run Norton Disk Test 4.5 or something similar on it to check for bad sectors.

That should quickly expose most physically damaged sectors.

Periodically run Disk Test again to see if things are getting much worse. (Flagging one or two "weak" sectors later is not uncommon, though).

If you really want to stress test it, run Spinrite on it (choose to not return any marked bad clusters to use unless you are changing interleave). But be aware that if the drive is on its way out, that may push it over the edge. But in my opinion, it is better to know in advance rather than have it surprise you later.

Did I mention back up any data first? :p
 
CHKDSK will identify orphan clusters and save them in the root directory as FILExxxx.CHK if run with the /f switch. But since CHKDSK fixes a variety of sins, run it first without the /f option to see what it will do.
 
FWIW, SpeedStor has a nondestructive low-level format, so I could've used that.

But I backed up my disk using PKZIP using the following command while on my drive D:\

Code:
pkzip -Whs -rp cm6426s.zip C:\*.*

This will back up everything, and I think all attributes were preserved. Comparing "pkunzip -v cm6426s.zip" and "CHKDSK C:" totals confirmed all the files files on my hard disk made it into the archive. You may wish to use floppy disks and via -&sa or -&sb option in place of a file glob.

Afterwards, I low-level formatted using SpeedStor, and plan to let Media Analysis/Write-Burn in run for an hour or so after the thunderstorm stops. Will let you know how it goes.
 
CHKDSK will identify orphan clusters and save them in the root directory as FILExxxx.CHK if run with the /f switch. But since CHKDSK fixes a variety of sins, run it first without the /f option to see what it will do.

I had run CHKDSK /F yesterday. CHKDSK claimed nothing was wrong as of today (which of course is a lie, but I guess it's limited in what it can find in terms of corruption). In addition to General Failure reaching DOSSHELL.SWP, DOSSHELL.INI gave Data Errors as well when trying to backup.

I can't find a FILExxxx.CHK file in my backup. Would it have been stored on the drive it was run on (I ran it while on drive D:\) and would the file be hidden?
 
Ultimately, I ended up doing a reinstall. I used SpeedStor from modem7's site to low-level format the drive, and then run a surface analysis. I then compared the surface analysis to that of the IBM PC AT Advanced Diagnostics diskette. Both programs show the same three bad tracks (I couldn't duplicate the error on head 1, cylinder 567 after repeated attempts), and two of those bad tracks are the same ones I supplied when I formatted the drive back in 2010 (using the Advanced Diagnostics diskette).
Click image for larger version  Name:	20210811_221346[1].jpg Views:	0 Size:	114.5 KB ID:	1220297 Click image for larger version  Name:	20210811_225558[1].jpg Views:	0 Size:	107.7 KB ID:	1220296

I can tolerate one bad track appearing after 10 years, so this leads me to think the drive still has some life left in it. But everything is backed up, so if the drive bites the dust, there's no big loss except me being upset that a drive died under my care :p. I am definitely a bit attached to this one, since CMI drives are known to malfunction, but this one lives.

Protip: Don't screw in the hard disk controller bracket while the drive is being written to. I was doing that to pass time during my first attempt to reinstall PC-DOS 5.0 after the format. The install failed at the end when doing the final command.com copy. And then when I reloaded DOS from a floppy, I got "Data Errors". Then when I reran SpeedStor, it claimed all tracks on cylinders 2, 4, 6, 8, 10, and 11 were unreadable. Doing another low level format and reinstalling DOS seems to have stopped the problem for now, and I'm back to the bad tracks shown above. I'll keep an eye on things.
 
Last edited:
If that happens, and the resulting depression starts to include suicidal thoughts, you must seek psychological help fast.

Rest assured, I will manage to go on somehow :). The hardware's lasted a long time anyway. Btw, your site has been quite helpful for quickly getting back up to speed :D!
 
Back
Top