• Please review our updated Terms and Rules here

Please Help Debug Assembler Routine!

pearce_jj

Veteran Member
Joined
May 14, 2010
Messages
2,808
Location
UK
I'm stuck with memory-mapped writes for my XTIDE derived board so I'd really appreciate some help checking the transfer code.

Linked to below is code I've added to the (otherwise unmodified) XTIDEv2 Beta BIOS. I've linked to it as it's quite readable in the mediawiki format, and of course it can be kept current.

If there is an error it presumably is near .WriteTransferLoop, since the routine works with port-based IO. Reads are working OK already.

Working Reads Code - This routine works with both port-based and memory-mapped transfers.

Possibly Bad Write Code - This routine works with port-based transfers, but when memory-mapped is attempted there is a brief blink of the drive LED, then a long pause, then the OS returns device not ready. After that reads can be undertaken OK.

Notes:
  • The board has a 'device ID' byte at port 30Fh, which returns '4'.
  • My 'technical reference' for the board has lots more details on how it is supposed to work, which I've summarised a bit below for the sake of brevity.

Many thanks!!

------------------------------------------------------------------------------------------------

Board Logic - Port Based IO

Assume port base 300h for the sake of clarity.

  • Reads via 300h/301h
  • Writes via 310h/311h (ISA A4 is used to distinguish reads & writes)
  • For reads, IDE /CS0 and /CS1 are set so that the drive sees a data transfer for port 300h and a no-op for port 301h. Hence the first read to 300h triggers a 16-bit transfer, the high byte being stored in a latch, and the second read (from 301h) transfers that byte from the latch. Programmatically, this is exactly per original XT/IDE 'chuck-mod' design.
  • Writes work in the same way, except that /CS0 and /CS1 function is reversed (by A4 being high), so the drive sees the no-op on port 310h (allowing the low byte to be stored in the latch) and the data transferred (to the drive) on port 311h. Hence, words can be written to port 300h.
  • IDE address lines follow the 'chuck-mod' address line mapping:
  • IDE Control ports are via base 308h


Board Logic - Memory Mapped IO

A loadable register at port 30Fh is used by a second address decoder to match the high 8-bits of the ISA address bus. To prevent a clash with port-based IO address decoder, the loaded value's MSB must be set (and similarly the port-based address decoder needs ISA A19 to be low).

Say D8h is loaded (via port 30Fh), the memory-mapped decoder will watch for addresses D800:0000h to D800:03FFh, as with port-based IO in two ranges:

  • 0000h to 01FFh for reads
  • 0200h to 03FFh for writes (ISA A9 is used to distinguish reads & writes)

ISA A0 is again used to control IDE /CS0 and /CS1, its function being reversed here by A9 to get the required IDE no-op ordering.

When the memory-mapped decoder matches, IDE DA0..2 are held low. Hence, the CPU should be able to transfer by "rep movsw" starting with D800:0000h for reads, and D800:0200h for writes.
 
Hi Chuck,

Those signals are used as the basis for the IDE interface read or write trigger and the logic buffer and latch gate control; they just can't be used to control IDE /CS0 and /CS1 since those need to be set as part of the address setup (with DA0..2), and completed before MEMR or MEMW are asserted. So, I use a different port or memory range for reads or writes:

So for reads,

  • 300h: /CS0=A, /CS1=N, DA0..2=N => PIO transfer from device triggered by /IOR, buffer loaded from IDE-DH, IDE-DL passed to PC-DB
  • 301h: /CS0=N, /CS1=A, DA0..2=N => DIOW/DIOR cycle ignored, buffer contents presented on PC-DB

But for writes it's the other way round:

  • 310h: /CS0=N, /CS1=A, DA0..2=N => DIOW/DIOR cycle ignored, buffer loaded from PC-DB
  • 311h: /CS0=A, /CS1=N, DA0..2=N => PIO transfer to device triggered by /IOW, buffer presented to IDE-DL & IDE-DH from PC-DB

For memory-mapped it's the same except the controller holds IDE DA0..2 low as the CPU cycles through the ranges (and of course /MEMR or /MEMW replace /IOR or /IOW in the above). BTW by IDE-DL I mean IDE interface D0..7 and D8..15 for IDE-DH.

Many thanks!
 
The code itself doesn't seem to be an issue--just make certain that you've done an "CLD" somewhere to make certain of the direction (increment vs. decrement).

First stupid question: Are you running this on an 8-bit system?
Second stupid question: Will memory refresh DMA mess you up?
Third stupid question: Do you have a logic analyzer?
 
Thanks, your time and expertise is very much appreciated.

So I guess the issue could be in the CPLD code. In answer to your questions - yes, don't know, no!
 
I trust that you've read this document.

One thing to try would be to do byte-at-a-time memory transfer; store the low byte, store the high byte, decrement your count, lather, rinse, repeat--just to see if the BIU isn't doing things the way you expect.

One way to see if it's DMA refresh is to disable channel 0 DMA (and interrupts) during the memory transfer, then reenable both after. The short time for a 256 word transfer shouldn't matter with memory integrity.
 
Yes indeed, been working from that.

Thanks for the ideas, will give that a whirl - perhaps just changing rep movsw to rep movsb might do the same (and doubling CX)?
 
i noticed that you don't clear the direction flags before the read loop. also, can't you change this code:

Code:
	mov		ax, 0x0200	; offset for writes (A9=1)
	push		bx		; save register value
	mov		bx, cx		; store total word count in BX, to keep AX as-is
	cld				; clear direction flag
.WriteTransferLoop:			; sector-by-sector transfer loop (256 words each iteration)
	mov		di, ax		; ES:DI now has mem-mapped IO start address for writes
	mov		cx, 256		; memory-mapped window is 512 bytes, CX now has 256 (words)
	sub		bx, cx		; reduce number of words left to get after this iteration
	rep		movsw		; do the transfer from DS:SI to ES:DI, count in CX (256)
	cmp		bx, 0		; any more to do?
	jne	.WriteTransferLoop	; repeat, if there's more to do
	xor		al, al		; set AL to zero
	out		dx, al		; disable memory-mapped IO
	pop		bx		;


to this?

Code:
	mov		ax, cx		; store total word count in AX
	cld				; clear direction flag
.WriteTransferLoop:			; sector-by-sector transfer loop (256 words each iteration)
	mov		di, 0x0200	; ES:DI now has mem-mapped IO start address for writes
	mov		cx, 256		; memory-mapped window is 512 bytes, CX now has 256 (words)
	sub		ax, cx		; reduce number of words left to get after this iteration
	rep		movsw		; do the transfer from DS:SI to ES:DI, count in CX (256)
	cmp		ax, 0		; any more to do?
	jne	.WriteTransferLoop	; repeat, if there's more to do
	out		dx, al		; disable memory-mapped IO


i also don't see DS being set to the source segment anywhere. do you need to do that, or is DS already set correctly on entry?
 
Yes indeed, been working from that.

Thanks for the ideas, will give that a whirl - perhaps just changing rep movsw to rep movsb might do the same (and doubling CX)?

I'd recommend initially staying away from rep mov initially.

As a test, I'd use "lodsb" followed by "stosb", just to make sure that the BIU isn't messing things up.

Also, does the operation fail with just a single-sector write?

You can shave a byte out of your loop by using OR BX,BX (or AND BX,BX) instead of the CMP rX,0. But that has nothing to do with your loop's operation.

@Mike, it shouldn't matter what DS is for a write, unless DS=ES--he'll write something to the drive, even if it isn't what he wants.
 
yeah, i know some sort of data would still get transferred but i just wanted to point it out in case it's something he missed.
 
Thanks so much for the input on this - really, very much appreciated.

Re DS - ES is moved to DS at the start of the block. For whatever reason, for writes the BIOS code is passing in the source via ES:SI. Thanks for the tip about OR instead of CMP (not done that yet, but I did save a byte too by changing mov cx,256 to mov ch,1).

Anyway after going over this again and again and completely failing to get my head around the DRAM refresh thing, or find a problem in the logic, I thought I'd just try and assemble it again.... and success! I made a few tweaks (eliminated the need for BX) but nothing really of note (wiki updated though) other than including CLD. I can only think that the version I was testing was corrupting DI (though I thought I'd re-flashed after finding that yesterday already).

Anyway, many thanks!
 
Just to add: I don't think you need to worry about DRAM refresh timing - on refresh cycles the DMA controller puts addresses between 0x00000 and 0x0ffff onto the bus (which is enough to get all the rows), so if the hardware is looking for accesses to 0xd8000-0xd83ff it won't see the refreshes.
 
Many thanks for posting this. The high 8-bits can be loaeded into the card, but A19 must be high. So looks OK then :)
 
You can remove the OR AX, AX instructions, the zero flag has already been set by SUB AX, CX and REP MOVSW preserves the flags. Also, CLD is redundant since that's already been done on entry to the Int13h handler.

I'm curious about the performance without your mod (assuming the current BIOS revision supports your card, I have a hard time keeping track of what's what with all the different hardware versions...)?

BTW, I like that the card supports both port based and memory-mapped I/O. Very nice work altogether! :)
 
Thanks!

The controller can read with 'chuckmod' BIOS (but can't write) so I adjusted my disk tester to include a read-only option and tested that across three BIOS varients, all v2-beta based. These results from PC/XT with a Seagate ST1 Microdrive, collected via disktest 2.0 on DOS 6.22, with a 4MB test file on a 2GB partition, no config.sys file present:

ChuckMod.bin: 237 KB/s
Portio2.bin: 250 KB/s
Memmap3.bin: 316 KB/s

Results agree to a stopwatch. The microdrive is consistently a little quicker than the compact flash cards I have; maybe some tweaking in the BIOS could improve them? Anyway, 'PortIO2.bin' is slightly faster than the chuckmod because the transfer loop is unrolled more in it.

For anyone interested, I've posted more info on the CPLD logic with schematics today, here.

Thanks for the tip on the flag; I'll check that for sure!

Re supporting both, the idea was to run in port-mapped 'by default' (since there aren't any free switches to select a memory-mapped base address and it's not necessary for V20 anyway), then have a utility to set the memory-mapped base address in the BIOS on-the-fly, so the transfer base address can be set as (and if) required to suit the system.
 
A little suggestion - move the xchg of cl and ch out of the startup and into just the port version, so cx remains count... then use "dec ax" instead of the subtract. Smaller opcode should really get things moving. likewise the "mov ch,1" is done to a zero'd register, so how about "inc ch" instead? then move that part after the rep and you can lose the "or" altogether.

Code:
; assumes ax contains number of 512 byte blocks

.WriteTransferLoop:  ; sector-by-sector transfer loop (256 words each iteration)
  ; cx always zero at this point
  mov   di,cx        ; clear DI - ES:DI now has mem-mapped IO write base address
  inc   ch           ; inc ch means cx=256
  rep   movsw        ; do the transfer from [DS:SI] to [ES:DI], count in CX (256)
  dec   ax           ; dec faster
  jnz   .WriteTransferLoop  ; repeat, if there's more to do

In your original all those 2 byte opcodes leave the cpu waiting on the BIU to the tune of 4-5 clocks per operation. I'd guesstimate 28 clocks or so per loop 'wasted'. The above change should shave off about 8 clocks of waiting for the BIU on a 8088, as well as 4 execution clocks. Since CX is always zero that mov you'd think would save one clock, but that's really 'lost' to the BUI fetch. Still, it is generally the faster opcode so...

Oh, on this:
and dl,0xf0 ; restore value in DX (the base IO port)
or dl,0x10 ; writes are via port Base+10h

Shouldn't that be an add, not an or? What if it's on a odd address? I don't know what port you're starting at typically, but normal locations for IDE like 170h or 1F0h that's not going to give you the needed 180h or 200h. I mean, if your device is only going to fall on a 32 port boundary (320, 340, etc) then you're probably fine... I'd probably use the add anyways just to be sure. Yeah, if this is for IDE at 170h, the port version is busted.

Side note -- shame you can't OUTSW on a 8088... of course that's why IOMEGA zip drivers don't work on same.
 
Many thanks indeed for posting this in such detail - I've made the changes and will test later.

Re port mapping, the comments in the code could be misleading; the logic needs A4 high for writes (and low for reads), so in general the board must always be aligned on a 32-port boundary as you guessed. The DIP switches provide such configuration between 200h and 3E0h (the same as the original XT/IDE I think).
 
pearce_jj said:
The controller can read with 'chuckmod' BIOS (but can't write) so I adjusted my disk tester to include a read-only option and tested that across three BIOS varients, all v2-beta based.

It looks like Tomi made a change to support writes with your card in r402 so I guess you're using the official beta and not the latest revision from the SVN repository?

pearce_jj said:
These results from PC/XT with a Seagate ST1 Microdrive, collected via disktest 2.0 on DOS 6.22, with a 4MB test file on a 2GB partition, no config.sys file present:

ChuckMod.bin: 237 KB/s
Portio2.bin: 250 KB/s
Memmap3.bin: 316 KB/s

Results agree to a stopwatch.

Very interesting! I'm surprised to see that unrolling the transfer loop makes such a big difference. We should probably look into unrolling more...

@deathshadow: All very good advice! Note that INC CH is still a 2 byte instruction though (not sure if you meant otherwise) but it does exeute in 1 cycle less on 808x CPU:s so it should be a tiny bit faster than MOV CH, 1.

pearce_jj said:
Re port mapping, the comments in the code could be misleading; the logic needs A4 high for writes (and low for reads), so in general the board must always be aligned on a 32-port boundary as you guessed. The DIP switches provide such configuration between 200h and 3E0h (the same as the original XT/IDE I think).

This means you should be able to replace this:
Code:
	and		dl, 0xf0	; restore value in DX (the base IO port)
	or		dl, 0x10	; writes are via port Base+10h
with this:
Code:
	xor		dl, 0x1f
 
Thanks so much for all the input on this - I really appreciate it.

I've updated the routines; there seems to be a bit of gain in the writes, 306KB/s peak. Reads are a little more eratic so harder to tell.

CF card throughput is bugging me though - typical numbers are 255KB/s read and 235KB/s write. Following the code it looks that the only wait can be polling the drive for data to be ready, is it possible that the BIOS code is somehow taking a (relatively) long time between polls? As then the ST1 would be a lot quicker as the data will be available immediately due to the read-ahead easily picking up the sequential read and the lazy writer accepting data straight to cache. I can't think of any other explanation.
 
Back
Top