HOWTO: Improve the performance of the XTIDE controller.

eeguru · Jan 24, 2012

Trixter said:
What I do know for sure is that the only system where DMA has the advantage is 808x. Once you hit 80186/NEC V30 and gain REP INSW, it exceeds what DMA can do.

I don't understand follow your logic at all. First DMA is way faster than rep movsw yet rep insw is way faster than DMA?

Confused. And pierce_jj, no. I barely have it working on Atmel reliably as of tonight. Work schedule has been rough. But your board will be mailed by the weekend.

Trixter · Jan 24, 2012

First DMA is way faster than rep movsw yet rep insw is way faster than DMA?

I was talking about two things in a single post -- sorry, my fault. I meant to write:

On 8088: DMA is faster than REP MOVSW because the CPU takes 4 cycles to move a byte but DMA can do it in 3, according to Chuck's last post (2 cycles/byte with 1 CPU cycle guaranteed between bytes).

On 80286: DMA transfers from a memory window is slower than any other method (REP INSW to snag words from a port, or REP MOVSW to copy from a memory window).

My point was that using DMA for transfers on 8088 is the fastest, but doesn't make sense to force for all platforms. Sorry for the confusion.

There's nothing wrong with how things are right now -- I love memory windows! I haven't looked at the code recently, but as long as it's not trying to do a REP MOVS with a segment override, it should work on all 8088s and the technique is already near optimum. I was just throwing the DMA thing out there.

eeguru · Jan 24, 2012

What's confusing is what makes the rep movsw timing diagram or the DMA timing diagram look any different on an 8088 verses a 80286? If they are the same, then the only different is the bus clock - typically 5 MHz for early 8237s, 4.77 MHz for 8088s and 6 or 8 MHz for 80286s. Which isn't a large jump between any of them.

Chuck(G) · Jan 24, 2012

One very nice thing about using programmed I/O instead of DMA is that one doesn't have to worry about the 64K physical memory boundary problem (those nasty error 9s from the BIOS).

But back to DMA...

Looking at the XT Hard disk BIOS, DMA is set up as "single" transfer. So, bus negotiation proceeds with DRQ asserted, then HRQ, wait for HLDA, setup addresses, issue ADDSTB, perform read/write, issue DACK, release HRQ...

Something like, 12-16 clocks/byte? Not all that fast.

On the 5170, 16-bit DMA is handled through a second, cascaded 8237, which makes the cycle time for a 16-bit DMA access slower than an 8-bit one. Ideally, you'd want 16-bit DMA to be faster, since new cards using DMA will almost always utilize 16 bit transfers. But you have the legacy of the 5150 to deal with. So you have to make room for the horse in your new horseless carriage. So 16-bit DMA on the 5170 isn't as fast as you'd want.

eeguru · Jan 24, 2012

That didn't answer the question though. If you restrict DMA on a 80286 to the lower channel numbers, why is it orders of magnitude slower than an 8088? I still don't understand the logic behind Trixter's claim.

Chuck(G) · Jan 24, 2012

Orders of magnitude? 10x 100x 1000x? I don't think that's true at all. You'd never get the floppy drive to work.

Trixter · Jan 25, 2012

I may have been thinking about DMA memory-to-memory transfers. But never mind; at this point I'm willing to retract all of my previous statements as bohonkus because the last thing I want to do is piss off the people who are making the XT-IDE, which is quite honestly the best PC retrocomputing project in the last decade. Carry on!

pearce_jj · Jan 25, 2012

Trixter, discussion is always good - I for one have learnt something here anyway.

eeguru · Jan 25, 2012

You're not pissing anyone off here. I'm genuinely curious as up until now my understanding of ISA timings has led me to conclude using the CPU as basically a non-arbitrated DMA engine with a rep movws instruction would keep the bus 100% saturated with nothing but data memory transfer cycles. I think a cycle intermediate clock or two can be saved per round trip byte on a DMA move, however the arbitration overhead per sector eats into that savings. I've also assumed that a motherboard's wait state generator is always downstream of the bus master whether it be a 8088, 8237, or 80286. So even DMA cycles would add the same waits needed to give memory or I/O devices the minimum data hold times they need.

I've also assumed that once you configure an I/O DMA move and the DMA controller arbitrates for the bus through HOLD/HLDA, as longs as the I/O device keeps the request line active - as an IDE controller would with a sector in hand - the DMA controller keeps the bus until the transfer is over. And since the CPU doesn't have a large prefetch queue or any cache, there is no parallel savings in efficiency by processing anything in parallel.

I've also assumed that interrupts are pointless. The main reason for them is to let the CPU continue to process normally while waiting for the sector fetch delay or a DMA cycle completion. The former really isn't a problem with modern drives - especially flash based. The later is moot per the previous paragraph. Not to mention most users are running a single tasking OS anyway.

So I've dropped the IRQ lines and any future plans on implementing DMA support. It just seems unnecessary. But some of my assumptions may be incorrect. That's why I'm asking. It's also why I get a bit defensive with any blanket statement like 'A must be better than B'. Well.. ok I'll believe you, just tell me 'why!'

Chuck(G) · Jan 25, 2012

Since the XTIDE BIOS isn't reentrant, a completion interrupt is possibly pointless unless the "wait for completion" uses the 5170 interrupt 15h services to signal a wait for completion. I don't know if the XTIDEs does--my guess is probably not. Since the XTIDE hardware interface isn't compatible with the 5160 MFM controller, it's doubtful that any multitasking software (such as Concurrent CP/M or MP/M-86) could make use of it on an XT.

Has anyone tried running CCP/M or MP/M-86 using the XTIDE?

Crypticalcode0 · Jan 27, 2012

Couldn't a DMA be accomplished by adding another CPU like the Z80?

eeguru · Jan 27, 2012

You can't bus master on an 8-bit ISA slot.

Crypticalcode0 · Jan 27, 2012

In any manner ISA isn't suppose to have a bus master beyond the CPU IIRC, EISA is a whole different beast in this matter.
I have been thinking and actually using custom parts like another CPU is a step back since you would need to guaranty a steady supply of them.(last i checked Z80's are EOL)

This is just some idea rambling in my head but, I am not sure if specs allow it but the 74ACT100 is something to look at for this need 1 of these to slow down the EEPROM to bus speeds, and another for offset pointer, and a third can be added for safety if needed.
Then you can use a internal EEPROM and use it for a program with a minor feed back loop to decrement till 0 from it's data lines over to it's own address lines and the rest of the data lines to increment said pointer.
The latch would keep the Address data untill new is send in or reset so this could be used as a pointer to offset to.
Total sum for DMA in normal logic 1 EEPROM, 2 8bit latches, Some logic for address decoding from bus to CS line and Bus sync logic.
EEPROM CS is safe to tie to GND or VCC, as long the latches are not placed in a unknown state.

I am just rambling my idea here but that is just about the simplest thing i can think of doing a DMA with
Well if someone can understand my core dump i am happy, if not I can ANSI art it out for you.

pearce_jj · Mar 14, 2012

Hi Chuck, you commented that you wouldn't do the mod this way (A0/A3, if designing from scratch) and that some additional circuitry could be devised to enable fast writes too. With the CPLD versions I wondered if this might become feasible, IIRC the DPv2 code uses only about half the CPLD logic space as-is.

Also, would you be prepared to make the source for the 'Chuck-mod' BIOS available?

hargle · Mar 14, 2012

the universal XTIDE BIOS has support for the chuck mod and is open source.
http://code.google.com/p/xtideuniversalbios/

I would hope that your card would use this BIOS, as it keeps everyone on the same codebase for bug fixes and features that we can all use, regardless of what silicon we're using.
This BIOS currently supports the original XTIDE, XTIDE rev2, and eeguru's CPLD designs, so you're the only one left to scoop up.

pearce_jj · Mar 14, 2012

It could do of course, and indeed I've read most of the info for the v2 BIOS with much interest. But what I'm really interested in is the circuit design to enable outsw to be used on a v20.

It's an aside, but I also personally like the lack of boot menu in the v0.11 and v1.15 messes with the MDA cursor (fixed now I realise), I guess it would be easy enough to silence the menu though.

Chuck(G) · Mar 14, 2012

pearce_jj said:
Hi Chuck, you commented that you wouldn't do the mod this way (A0/A3, if designing from scratch) and that some additional circuitry could be devised to enable fast writes too. With the CPLD versions I wondered if this might become feasible, IIRC the DPv2 code uses only about half the CPLD logic space as-is.

Also, would you be prepared to make the source for the 'Chuck-mod' BIOS available?

I think I described the mod much earlier in this thread--there weren't extensive mods to Hargle's code--just the data input and output routines and some port redefinitions.

If I were to do a CPLD version, the change that would be required would be to take into account that the write signal to the IDE interface needs to occur on the output of the high-order byte, while when reading, the read signal occurs on the input of the low-order byte. Both IN AX,DX and OUT DX,AX transfer the low-order byte first, then the high-order byte.

My mod was simple in getting reads running as fast as practical (there's no reason that a test for 186/V20 or better couldn't be made and an INSW used) and speed up writes (which typically enjoy only about 10% of the total traffic) a bit by making the byte port addresses contiguous.

pearce_jj · Mar 14, 2012

Many thanks.

pearce_jj · Mar 15, 2012

Hi Chuck, I've been looking through the XTIDEv2 schematic thinking about writes via OUTSW; I've come up with swapping the operation of U4 inputs D0 & D1 for writes, (i.e. NXOR with /IOW-IDE). Hence the PIO write command gets set on the second byte transferred.

Then dedicated 74LS245's for reads and writes, the write mode chip being connected to D8..15 on the B side. Finally U2 would be buffering the low byte instead of the high, i.e. opposite to the existing.

Along the right lines?

Chuck(G) · Mar 16, 2012

I haven't looked at the V2 schematic in some months, but I think you've got the right idea.

HOWTO: Improve the performance of the XTIDE controller.

Veteran Member

Veteran Member

Veteran Member

25k Member

Veteran Member

25k Member

Veteran Member

Veteran Member

Veteran Member

25k Member

Experienced Member

Veteran Member

Experienced Member

Veteran Member

Veteran Member

Veteran Member

25k Member

Veteran Member

Veteran Member

25k Member