Problem Statement
Yesterday, I ran CHECKIT on my PC AT hard disk[SUP]1[/SUP] running DOS 5.0 and got a pleasant surprise:
I'd never had trouble with this hard disk until recently; track 71 occasionally has a soft ECC error, but it disappears. It's definitely possible the drive is on its way out after 37 years. But I still am kinda attached to the drive, seeing it's the original one of the few notorious CMI drives still working.
Debugging
I don't know what "DOS File." error means, so I decided to dig deeper. Here's what I found via running a bunch of int 0x13 reads in DEBUG and looking at the error code:
Next, I tried to figure out which file these sectors belonged to. Based on the BPB[SUP]2[/SUP], I was able to figure out that CHS 139, 2, 15 is the start of FAT cluster 0x928, and CHS 157,1,4 is the start of cluster 0xa53. Each cluster is 4 sectors, and I confirmed that the sectors adjacent to these clusters read fine. By doing more int 0x13 reads to search the FAT using DEBUG's "s" command, I found the following FAT cluster chain:
The first bad cluster points to the second bad cluster. By reading the good sectors, I inferred the bad file was in "C:\DOS" based on its contents. I then did another DEBUG "s" on C:\DOS's directory entries to find that the bad file was DOSSHELL.SWP. Sure enough (CTTY COM2):
How To Fix?
I recall a crash happening recently (less than a week ago) while I was running DOSSHELL off a floppy, and I don't recall running CHECKIT since then. I don't think these particular sectors being bad is a coincidence. Maybe the crash happened during a multi sector write[SUP]3[/SUP]? Are there any known bugs back in the day w/ DOSSHELL?
I don't care about saving DOSSHELL.SWP. The rest of the drive appears to be working fine for now, and I'll keep an eye on it. So my main question is: what recourse do I have to non-destructively remove both?
As a bonus, are there utilities to find "orphan" FAT chains that don't link to any existing file and display/free them?
As for track 71's soft errors, maybe I need to rewrite it. Are there utilities to rewrite a track safely[SUP]4[/SUP] without a low-level format beforehand[SUP]5[/SUP]?
Footnotes
Yesterday, I ran CHECKIT on my PC AT hard disk[SUP]1[/SUP] running DOS 5.0 and got a pleasant surprise:
Code:
=== HARD DISK 0 (C:) ===
Test Configuration:
Cylinders: 614
Heads: 4
Sectors: 17
Capacity: 21,377,024 Bytes
Test Results:
Controller Diagnostic Test..................................Passed
Linear Read.............................................. ...FAILED ***
Butterfly Read.............................................. Aborted
Cyl Head Severity Test Notes
---- ---- -------- ----------- -----------------------------------
139 2 MAJOR Linear DOS File. Drive C:.
139 3 MAJOR Linear DOS File. Drive C:.
140 3 MAJOR Linear DOS File. Drive C:.
157 1 MAJOR Linear DOS File. Drive C:.
334 2 OK Linear Marked by Low Level Format.
565 1 OK Linear Marked by Low Level Format.
*** END TESTS: 4 ERRORS ENCOUNTERED ***
I'd never had trouble with this hard disk until recently; track 71 occasionally has a soft ECC error, but it disappears. It's definitely possible the drive is on its way out after 37 years. But I still am kinda attached to the drive, seeing it's the original one of the few notorious CMI drives still working.
Debugging
I don't know what "DOS File." error means, so I decided to dig deeper. Here's what I found via running a bunch of int 0x13 reads in DEBUG and looking at the error code:
- 139, 2, 15- Cannot find address mark
- 139, 2, 16- Cannot find address mark
- 139, 2, 17- CRC Error
- 139, 3, 1- CRC Error
- 157,1,4- CRC Error
- 157,1,5- Cannot find address mark
- 157,1,6- CRC Error
- 157,1,7- Cannot find address mark
Next, I tried to figure out which file these sectors belonged to. Based on the BPB[SUP]2[/SUP], I was able to figure out that CHS 139, 2, 15 is the start of FAT cluster 0x928, and CHS 157,1,4 is the start of cluster 0xa53. Each cluster is 4 sectors, and I confirmed that the sectors adjacent to these clusters read fine. By doing more int 0x13 reads to search the FAT using DEBUG's "s" command, I found the following FAT cluster chain:
Code:
0x24d => 0x24E => 0x25d => 0x928 => 0xa53
The first bad cluster points to the second bad cluster. By reading the good sectors, I inferred the bad file was in "C:\DOS" based on its contents. I then did another DEBUG "s" on C:\DOS's directory entries to find that the bad file was DOSSHELL.SWP. Sure enough (CTTY COM2):
Code:
C:\DOS>type DOSSHELL.SWP
General failure reading drive C
Abort, Retry, Ignore, Fail?r
General failure reading drive C
Abort, Retry, Ignore, Fail?f
Fail on INT 24 - DOSSHELL.SWP
How To Fix?
I recall a crash happening recently (less than a week ago) while I was running DOSSHELL off a floppy, and I don't recall running CHECKIT since then. I don't think these particular sectors being bad is a coincidence. Maybe the crash happened during a multi sector write[SUP]3[/SUP]? Are there any known bugs back in the day w/ DOSSHELL?
I don't care about saving DOSSHELL.SWP. The rest of the drive appears to be working fine for now, and I'll keep an eye on it. So my main question is: what recourse do I have to non-destructively remove both?
As a bonus, are there utilities to find "orphan" FAT chains that don't link to any existing file and display/free them?
As for track 71's soft errors, maybe I need to rewrite it. Are there utilities to rewrite a track safely[SUP]4[/SUP] without a low-level format beforehand[SUP]5[/SUP]?
Footnotes
- Yes, it's the original CMI drive, type 6. I've been using it for 10.5 years without issues.
- FAT16, 512 bytes/sector, 4 sectors/cluster, 1 reserved sector, 2 FATs, 512 root dir entries, 41840 sectors, FAT ID F8, 41 sectors per FAT
- I don't know if an interrupted multi-sector write explains the address mark errors though. I don't think that part of the disk was rewritten after the initial format?
- By "safely", I mean maybe back up the good sectors on the track to a floppy before doing the write.
- Assuming no power loss. If that happens well, everything's backed up anyway, and I'll do another backup before hand.