• Please review our updated Terms and Rules here

8-Bit IDE Controller

Ha!

I spoke too soon. Mike's going to point and laugh and say "told ya so". :p

I just came across the error with this drive. It cannot write reliably. The reads seem to be just fine, since I was able to boot the thing, but with mike's approval, I tried to fdisk and format the drive, and the thing just went to hell.

A little bit of playing around so far as shown me that if I blast the first few sectors to all FF's, it writes just fine.
If I try and go back and change the sector to all 00's (using norton diskedit), I get the following sequence on every 16 bytes:
Code:
00 FF 00 FF 00 FF 00 00 - 00 FF 00 FF 00 FF 00 00

Looks surprisingly like our 2nd byte of data is missing. can we blame the flip flop not being able to deliver the data fast enough? It's got to be on the harry edge though, since the 8th and 16th bytes are making it just fine.

It's interesting indeed. Not sure if this is within my capability to debug, but I'll keep playing.

This sounds like we need someone with a logic analyzer or trace analyzer. Subtle timing incompatibilities are usually a real ***** to find. That's my $0.02.

Thanks and have a nice day!

Andrew Lynch
 
My parents are here visiting and they live in Austin Minnesota. That's near your neck of the woods, eh? Not that it has anything to do with Indian cuisine.

At any rate, I posted the PCB for routing on the N8VEM wiki. Please download and run the FreeRouting.net router. Normally, what I do is run on Autorouter -parameter - detail parameter -rip up =100 (default) until it finishes. Then change rip up to 10, repeat, change to 1, repeat, back to 10, repeat, 100, 1000, 10000, 100000, and then cycle back down and up twice. The later cycles will take less time.

http://n8vem-sbc.pbworks.com/browse/#view=ViewFolder&param=XT-IDE

http://www.freerouting.net/fen/index.php

This may seem excessive but the extra cycles will crunch out all the trace slack and unnecessary vias possible. Obviously, having as few vias and the minimum length as possible will improve reliability and performance. It is worth the extra effort.

Thanks and have a nice day!

Andrew Lynch

PS, save frequently!

Austin MN is very close to me.

I will make these runs on Saturday morning - that will give me a reasonable block of time to concentrate and do it right. If I have questions I send them using the private message feature here to avoid cluttering the thread.


Mike
 
Hargle .. utterly craptastic.

Looks like we should have hung around until dark and tried to get some time on a logic analyzer. I see a weekend debug session in the future ...

All thing considered, it is just one of dozens of drives that work correctly. But I'm wondering now if there is something not quite right how much of an effect it is having on other drives, and if anybody is looking for it.


Mike
 
Hi! Mike & Hargle, I appreciate your thoroughness but I think we have to accept we are not going to be able to support 100% of IDE devices. The engineering investigation is goodness but there are just a lot of screwy IDE-like drives and devices out there.

Based on the experiences with the N8VEM Disk IO board which had is own timing issues I would look first at the chip select lines on the drive relative to the controller. That's were we had issues with WD and Quantum drives while Seagate and Maxtor seemed to work just fine. You might even consider trying the "smart cable" approach to tweak the timing sequencing. The N8VEM Disk IO "smart cable" fixed compatibility with WD and Quantum but I just shelved the concept as a PITA to use. The "smart cable" is easy to build as it only requires a small PCB and 2 ICs. It is general though and can be applied to qualify signals with each other. Fixing one drive's timing issues may just break anothers though.

If Mike or someone has a logic analyzer then maybe there is a chance to find the timing issue at its root. Another possibility is just brute force swapping of components among various 74X families until you stumble on the right timing combination or at least reveal clues. Certainly not the preferred approach but something doable until the test tools are available.

Still, no matter what we do there are going to be problematic drives and my suggestion is just to document them on the wiki and press on. The XT-IDE compatibility has been pretty good so far and Hargle, you've done superhuman effort to make the BIOS excellent. Publish the source and let builders tweak it. Maybe something will shake out eventually but I certainly wouldn't stress over it.

Regardless, I support the project and will follow your lead on where you want to go. The next version of the PCB looks really good. My recommendation is to review the new PCB design closely and make sure there are no problems with it that will cause us grief later on.

Thanks and have a nice day!

Andrew Lynch
 
hey gang,

mike, if you've got a free saturday or sunday and want to pop up to my workplace, we can drag out the logic analyzer and we can get our hands dirty with this thing. The next 2 weekends I'm busy, but the 1st two in august appear to be available currently. Either sat or sunday would work.

I'd be willing to spend at most 1 day trying to get to the bottom of it. Maybe not getting a fix for it, but at least trying to get a bead on what the problem is so that we could make the next step to fix it. I want to put a time limit on such an endeavor entirely because this card is already a home-run in my mind (kb2syd may not think so though!) and we could easily burn a LOT of time trying to get that last 2% working, when there are plenty of drives that seem like they are all working just fine.


---

As for the next proto run, I'll gladly pick up the costs for it, but I think doing any more than 10 cards is a waste. Considering there are only about 5 of us actively working on the debug. I'm willing to pull the trigger anytime the layout is finished.
 
Still, no matter what we do there are going to be problematic drives and my suggestion is just to document them on the wiki and press on.

My operating system background is going to show through here. Consider the following:

  • This is a 16GB IBM drive from the DeskStar product line. Not exactly an obscure manufacturer or size.
  • On my card it IDs, but can't read or write data at all. It's not a case of corruption even - it's a permanent 'drive not ready'.
  • Hargle is seeing a pretty nice and repeatable pattern of corruption on his cards.

Most of the testing that has been done has been very basic. There might be many other drives out there that are sporadically corrupting bytes, and people won't realize it for a long time until things really get screwy. Our general level of testing isn't robust enough. Put another way, we don't know that all of the other drives are working perfectly.

Before we start 'blacklisting' particular drives and moving on, I want to understand what is special about this not-very-special drive and understand why it is like this. Especially since this pattern of corruption seems to be repeatable, something that I couldn't see with my version of the card because it was just so badly behaved there. After we understand what is wrong, then we need to determine if it applies to other drives or components on the card.

My 2 cents .. nothing more. Hargle has access to test equipment. We just need to find some time to do it.


Mike
 
On my card it IDs, but can't read or write data at all. It's not a case of corruption even - it's a permanent 'drive not ready'.

This is one of the bits that may help us figure out the issue in a timely manner.
My cards read wonderfully. Yours doesn't, so if we can get some scope/analyzer shots of the differences between the two, we might be able to get some more insight.

My other worry is that I'm not so much of a hardware/scope monkey. I can do a few things on the logic analyzer to be considered dangerous, but setting up complex triggers is outside my comfort zone. We need someone with good hardware skills to help here too.


FWIW: I have a helltest program that writes random data out to random sectors of the drive and then reads it back and compares. You can even do the read portion later, so you can write the data patterns with a known good machine and do reads only on the XT or vice versa. Anyway, said program caught this failure immediately. No other drive I've worked with has had any issue with this program. I see that I don't have it on the wiki, so I will make it available shortly, as I think this is an excellent tool for beating the crap out of the card and any drive that we want to test with. It's likely going to bring out errors faster than just your standard fdisk+format test that most of us are doing.
 
FWIW: I have a helltest program that writes random data out to random sectors of the drive and then reads it back and compares. You can even do the read portion later, so you can write the data patterns with a known good machine and do reads only on the XT or vice versa. Anyway, said program caught this failure immediately. No other drive I've worked with has had any issue with this program. I see that I don't have it on the wiki, so I will make it available shortly, as I think this is an excellent tool for beating the crap out of the card and any drive that we want to test with. It's likely going to bring out errors faster than just your standard fdisk+format test that most of us are doing.

I was going to write my own, but I'll save some time and use yours. If I can run such a test on a drive for a few hours with no failures, then I'm happy to call things good.


Mike
 
Andrew,

I've started it running. If I read the directions correctly I am just varying the ripup parameter. The help wasn't very good - can you give me a better description of what this parameter controls, and why it helps to 'massage' the board with it in the pattern you specificed?

It looks really nice on the screen. Too bad it's not threaded ... I could be running it 4x faster.


Mike
 
I'm not worried about cutting the capacity of the drive in half. I'm more worried about sector addressing issues.

If you do it that way, then you wind up with the equivalent of 256 byte sectors when they should be 512. Compensating for that is worse than just doing it correctly.

That trick is acceptable on an 8 bit micro where there is no existing API or expectations about sector size. It's not so hot for a PC solution.

From a software standpoint, it doesn't seem that bad to me. Far worse is having 1024 byte physical sectors and blocking/deblocking to 512. But going from bigger logical to smaller physical isn't a problem. You just read or write 2 when the BIOS request asks for 1.
 
From a software standpoint, it doesn't seem that bad to me. Far worse is having 1024 byte physical sectors and blocking/deblocking to 512. But going from bigger logical to smaller physical isn't a problem. You just read or write 2 when the BIOS request asks for 1.

That also ruins any chance of reading the drive on another system without software help.

That design discussion was a long time ago .. it might be a suitable way to go for a simpler version of the board.


Mike
 
That also ruins any chance of reading the drive on another system without software help.

That design discussion was a long time ago .. it might be a suitable way to go for a simpler version of the board.


Mike

It can simply been an option in the SetCard program too, since it actually only depends on the firmware if the current design is used.
 
Still, no matter what we do there are going to be problematic drives and my suggestion is just to document them on the wiki and press on. The XT-IDE compatibility has been pretty good so far and Hargle, you've done superhuman effort to make the BIOS excellent. Publish the source and let builders tweak it. Maybe something will shake out eventually but I certainly wouldn't stress over it.

Here is the spreadsheet to document this. Add more rows as needed. I will turn it into a table for our Wiki and keep the table up to date.


My recommendation is to review the new PCB design closely and make sure there are no problems with it that will cause us grief later on.

My suggestions are mostly aesthetic and are non-functional.

Ok, first think I notice is the instructions for JP1 and JP2 are not very clear. Rom Enabled and Right Enabled are sort of floating there with no clear indication. There is a ton of space on the right of the jumpers. Perhaps this could fit in the space:

Code:
JPx On = Write Enabled *   JPx: On = Rom Enabled*
   Off = Write Disabled        Off = Rom Disabled

* Default Value

Second thing is Conn_1 is not defined either. It be nice to add a "P-5 = Hard Drive LED" in the space to the right of it.

Maybe after where is says "SW1" put "See back for details" or something like that.

Conn_5x2 also needs the Word "IRQ" somewhere around it. Perhaps make the numbers a bit smaller, and put it centered below it.

Conn_3 needs a bit more explanation as well. There is space below it that could work for this.

Put the number 1 on the left of DIPS_08 and the number 8 on its right, this will help people read and use the table on the back of the card.

On the table on the back of the card put a star next to the default values (and add "* Default" at the bottom). Changing the format could make it a little more readable, try this:

Code:
Io Range
Dip Switches 4:1
4 3 2 1
0 0 0 0 = 500h
etc...
1 0 0 0 = 300h*
etc...
0 = Off 1 = On    *Default setting.

Adding the dip numbers at the top and adding the space between the numbers would make it a lot more readable and easier to transfer the setting to the front.

Well that my .02 on the text of the card, other than that the layout looks damn good (thumbs up to Andrew for this one).

And in the words of our illustrious circuit designer :D

Thanks and have a nice day!
 
Last edited:
Hi! I added the table to the copper silk screen already so its on there. That was relatively easy. Its trying to figure out if its accurate is what is confusing me.

Am to understand from the table that an IO Range of 340h and Memory of E8000-E9FFF would look like this (as you look at the front of the card):

Code:
      SW1
1 2 3 4 5 6 7 8
0 1 0 1 0 1 0 1

or would it be
Code:
      SW1
4 3 2 1 8 7 6 5 (ie the switch numbers are not quite sequential)
1 0 1 0 1 0 1 0
 
Andrew,

I've started it running. If I read the directions correctly I am just varying the ripup parameter. The help wasn't very good - can you give me a better description of what this parameter controls, and why it helps to 'massage' the board with it in the pattern you specificed?

It looks really nice on the screen. Too bad it's not threaded ... I could be running it 4x faster.


Mike

Hi Mike! The "rip up" parameter in autoroute/trace optimizer is the parameter that controls the autorouter to set the "weight" in the algorithm as to when it can remove existing traces to help route new traces. The lower the "rip up" weight, the "deeper" the algorithm can search to by pulling up existing traces to route new ones. Larger weights tend to attempt routing with less or no removing of existing traces.

Each approach has its benefits and costs. The deeper searches tends to find routes which eliminate vias that shallower searches miss. Shallower searches tend to be better at reducing the over all length of the traces by improved routing of the current trace paths.

My finding is two full passes through the entire spectrum of rip up weights will produce a very well routed board with minimum vias and tight minimal trace lengths.

The key is to let the optimizer run until completion on each setting. For example, on rip up weight 100 the router will run many passes and keep going until it completes two back to back runs where it can neither reduce vias nor overall trace length.

The PCB started with like 157 vias and >430 inches of overall trace length. With some quick optimization on my home PC I was able to shrink the via count down to 79 or so. I suspect it can go much lower though with some patience in the auto router. Even small simplifications can yield big improvements in signal quality and board reliability so it is definitely worth the extra time needed to crunch the PCB trace routing down as far as it will go.

If you are feeling ambitious, you can also do manual routing to improve results. However, when you manually route you are not as constrained by the rules of the optimizer so you can sometimes introduce "weirdness" that will complicate later trace optimization. The optimizer has all sorts of rules it follows like weights for component versus copper side trace orthogonality, routing angles, etc.

I hope this helps! If you have any questions please let me know. KiCAD/FreeRouting.net is a free tool so that's why I insist on using it. There are commercial solutions available I could use but I want to keep the barriers to entry for the N8VEM builders as low as possible. A truly useful version of Eagle costs many hundreds of dollars although there are limited capability versions available for hobbyists, etc.

Thanks and have a nice day!

Andrew Lynch
 
Am to understand from the table that an IO Range of 340h and Memory of E8000-E9FFF would look like this (as you look at the front of the card):

Code:
      SW1
1 2 3 4 5 6 7 8
0 1 0 1 0 1 0 1

or would it be
Code:
      SW1
4 3 2 1 8 7 6 5 (ie the switch numbers are not quite sequential)
1 0 1 0 1 0 1 0


Hi! I think neither is right. I just occurred to me there is yet another level of bit flipping. To register a 0 in an bit, the switch has to be ON to ground the pin. In other words IO=300h, ROM=D0000 would be

SW1

1 2 3 4 5 6 7 8
0 1 1 1 0 1 0 0

which is the exact opposite of what I have on the silk screen table. Argh.

IO=200h, ROM=C0000h would be

SW1

1 2 3 4 5 6 7 8
1 1 1 1 1 1 1 1

IO=3E0h, ROM=FC000h would be

SW 1

1 2 3 4 5 6 7 8
0 0 0 0 0 0 0 0

It appears the table needs some rework. Would adding the a not to the table "0=ON, 1=OFF" be too confusing? Probably.

Thanks and have a nice day!

Andrew Lynch
 
I only have a rudimentary knowledge of circuits - my last class on it was 16 years ago. I'm fascinated that anything this powerful is free, but I guess that is part of the trend toward open software over the last 15 to 20 years.

Right now I'm very content to let it crunch away. But it is slow. If I interrupt it during a batch run to save the current work instead of letting it run to completion, how bad is that?

Right now it has been running over 12 hours and it has removed 3 vias and 0.33% of the original total trace length.
 
Hi! Thanks for all the great comments on the silk screen labels and notation. This all seems so clear to me on the bench but when staring at it on computer screen is all a jumble. Hopefully this is a better description.

I uploaded a new PCB layout. Please check it out and post any comments on improvements. This is getting better I think.

Thanks and have a nice day!

Andrew Lynch
 
I only have a rudimentary knowledge of circuits - my last class on it was 16 years ago. I'm fascinated that anything this powerful is free, but I guess that is part of the trend toward open software over the last 15 to 20 years.

Right now I'm very content to let it crunch away. But it is slow. If I interrupt it during a batch run to save the current work instead of letting it run to completion, how bad is that?

Right now it has been running over 12 hours and it has removed 3 vias and 0.33% of the original total trace length.

Hi Mike! Yes, FreeRouting.net is notoriously slow so be sure to reserve 3-4 weeks of CPU time to burn through all the iterations. At least that's what it takes on my machines. You can interrupt jobs any time you like and it will just start up from where you left off. However, it starts all passes from the lower left and works to the upper right no matter where you abort.

My recommendation is to save early and often. Just keep running until the optimizer runs out of better solutions and then change the rip up value to perturb the board for more and better crunching. It requires some patience but it will pay off. Some of my N8VEM boards have run for literally months on end.

There is another benefit to running the optimizer that is not immediately apparent... every pass that it changes tends to improve trace orthogonality so that horizontal traces are on component and vertical traces are on copper sides. This is real goodness and improves circuit performance by reducing inadvertent coupling between traces. The fewer traces the better for PCB performance.

Thanks and have a nice day!

Andrew Lynch
 
Back
Top